PaNOSC WP3 kicked off at the European Spallation Source
On 23-24 May 2019, the European Spallation Source – ESS hosted the kick-off meeting of PaNOSC Work Package 3 – Data Catalog Services, at the ESS Data Management and Software Centre (DMSC) in Copenhagen. After opening presentations of all invited partners, the meeting started with an overview of the results stemming from the data catalogue survey circulated among the PaNOSC partner institutions. The survey highlighted that the RIs in the partnership have six different catalogues with little overlap in terms of technology. PaNOSC foresees to move to common metadata definitions and interoperable catalogues by 2023 and to establish open data services with common APIs, for which WP3 plays a vital role. From the survey and the following roundtable, it can be stated that the 6 PaNOSC partners are at varying stages of catalogue use, ranging from beginning (ELI), to partial integration (CERIC, ESS) to full integration (ESRF, ILL). All partners still lack open data with full metadata. Stuart Caunt from ILL presented the use cases collected so far by the partners, to decide what should be part of the core WP scope. The main objectives will be to provide data to be harvested by EOSC/b2find/OpenAire to make the data available in their repositories and develop a common search API in order to enable a specific search using domain specific search terms. Authentication will be required for access to embargoed data, which is needed to deliver the data analysis portal within WP4. Hence the API will allow an optional integration of a suitable authentication solution for searches. A number of services around the data, like access for download, transfer or processing facilities, like Jupyter notebooks, can be made available via a link or identifier returned from the search. With reference to FAIR data API development, Gareth Murphy from ESS showed a roadmap to start providing open data to EOSC-hub, where PaNOSC partners can provide metadata to EOSC data discovery services such as b2find or OpenAire. By providing an OAI-PMH endpoint and schema mapper, PaNOSC can support generic metadata specifications (Dublin core) but also offer some common extensions to be agreed across the partners or with other RIs. A more modern alternative to OAI-PMH would be ResourceSync. The discussion on catalogue integration focused on distributed facilities, such as CERIC-ERIC and ELI, which have to cope with the data catalogue integration internally on facility level as well, communicating, harmonizing, and introducing a data catalogue at each member of the infrastructure. CERIC strategy is to implement a data catalogue and data analysis services for its Italian Partner Facility at Elettra first, and then to apply the solution to the other CERIC partner facilities. ELI is at this point evaluating the different options for cataloguing data. They can benefit from the experiences of the partners with their diverse set of solutions. Finally, on the topic of common file formats, all partners agreed to share examples of their current practises. By looking at the various implementations of NeXus/HDF5, the partners want to derive commonalities that will inform the selection of search criteria in the data API. At the same time, this gives the partners the opportunity to build strategies for making the information in the files more interoperable, which will benefit the consumers of the data, i.e. EOSC analysis services. The partners agreed to hold a face to face follow-up workshop in September at ILL and planned a series of video conference meetings to continue the development and knowledge sharing until then.
Keywords
EOSC, data catalog, data catalog services, PaNOSC, European Open Science Cloud, Open Science, Open Data