Periodic Reporting for period 3 - PaNOSC (Photon and Neutron Open Science Cloud)
Reporting period: 2021-12-01 to 2022-11-30
PaNOSC is one of five science cluster projects supported by the INFRAEOSC-04 call. The partners joining their forces in the PaNOSC project are ESRF (synchrotron), ILL (neutron source), EuXFEL (free electron laser), ELI-DC (multi-site optical laser sources), ESS (neutron source) together with the ERIC CERIC-ERIC (3 photon and neutron sources), and e-infrastructures EGI (partner) and GÉANT (contributor). The participating RIs have very different levels of maturity, ranging from operational and currently upgraded facilities (ESRF, ILL), recently put into operation (CERIC-ERIC, EuXFEL), just starting up (ELI), to still under construction (ESS). PaNOSC will share best practices and expertise to bring all partner sites rapidly up to the same level of FAIR data management.
The five key objectives of the project are as follows:
1. Link the participating Research Infrastructures (RIs) to the EOSC by exposing data, providing services, and promoting its use by the scientific community. Technically this involves persistent identity management, access to compute and storage, data transfer and archival. Organisationally it implies a common approach between research infrastructures and provision of an efficient and competent user service.
2. Make scientific data produced at Europe’s major Photon and Neutron sources fully compatible with the FAIR principles by adopting a harmonised data policy and by adding rich and meaningful metadata to the experimental data generated in the RIs. All publicly funded data generated in the RIs will be made open (after an embargo period) and downloadable in accordance with the data policies adopted at the facilities.
3. Provide innovative data services to the users of these facilities locally and to the scientific community at large with the EOSC. Remote data reduction and analysis services will be implemented to help scientists interacting with data sets of variable size.
4. Increase the impact of RIs by ensuring data from user experiments can be used beyond the initial scope. Exposing data to the EOSC will allow combining data sets from different laboratories, cross domain and cross disciplinary.
5. Share the outcomes with the national RIs who are observers in the proposal and the scientific community to promote the adoption of FAIR data principles, data stewardship and uptake of the EOSC. The outcome of the work undertaken in PaNOSC will be shared with and promoted in the entire photon and neutron community and beyond.
The release of a new FAIR data policy for PaN sources has generated significant interest on Zenodo, and discussions will now start for its implementation at the PaNOSC partner RIs and its promotion within the community. The metadata harvesting has been enabled by implementing OAI-PMH at all sites and registering these with the EOSC projects OpenAIRE and re3data to make the data findable. An electronic logbook developed at the ESRF shows the way how to enhance metadata with a rich description of the experiment.
The Jupyter notebook services have been set up at all partner sites, including EGI. In the future Jupyter services should be standard services offered by the EOSC with support for different hardware resources, including CPUs and GPUs.
Site data portals are being installed and improved at partner sites and allow access to open data and analysis services. Work has started on an HDF5 web viewer. This viewer will have an impact for the wider scientific community once it is integrated in the Jupyter ecosystem.
The ray-tracing simulation software OASYS has been further improved and extended and is now the de facto solution for designing new beamlines at photon sources world-wide.
One of the main objectives of PaNOSC is to connect the Photon and Neutron RIs to the EOSC. A first step consisted in implementing an AAI (Authentication-Authorisation-Infrastructure) with GEANT in such a way as to be compatible with the future EOSC AAI. The same is underway for data transfer where Globus-online and OneData are being validated. However it is not clear which technology will be part of the EOSC core services.
The pan-learning.org training platform has been installed at ESS and a number of training activities have been carried out using the pan-learning.org platform.
The COVID-19 virus has forced the PaNOSC partners already in operation to propose remote services for all their facilities, including remote experiment control. This has made the PaNOSC outcomes especially important and increased the priority and interest in the outcomes of PaNOSC and the EOSC at all sites.
Most of the ESFRI roadmap projects, and similarly all national RIs, are facing an exponential growth in research data. PaNOSC is a stepping stone for harnessing the data avalanche and for ensuring that the scientific productivity can be kept up or increased and that access to data, software, and computing resources through the brokering of the EOSC ecosystem will be seamless. However, this is not only a technical challenge. In order to become a reality requires an intensive coordination effort with the other science cluster projects (EOSC-LIFE, ENVRI-FAIR, ESCAPE and SSHOC) and the on-going and future INFRAEOSC projects. In this context PaNOSC will allow to acquire experience with remote analysis services and prepare the ground for a pan-European support infrastructure to help scientists at large to interact with FAIR data.
PaNOSC has created the Human Organ Atlas (https://human-organ-atlas.esrf.eu) data portal and publishing 50 unique open datasets with complete metadata for re-use in a number of scientific fields including COVID-19 research.