Preserving data for the long run
They say data is king. But the EU-funded ARCHIVER project says, ‘actually, it’s complicated’. “Data is king when digital preservation can ensure continued access to the research for as long as necessary while maintaining its intellectual control,” says João Fernandes, project leader at the European Organisation for Nuclear Research (CERN) and ARCHIVER project coordinator. “Unfortunately, due to a lack of cost planning and to solutions falling short, most research projects struggle to properly preserve their data for the long run.” To put data back on its throne, the ARCHIVER project designed, prototyped and piloted innovative solutions for the long-term digital preservation (LTDP) of scientific data sets.
Actual solutions available now
The project’s work was driven by the actual, real-world needs of a diverse range of stakeholders, including CERN, DESY, EMBL-EBI and PIC. “We started by identifying the current gaps in the preservation services offered to the public research sector, taking lessons learned from past initiatives,” explains Fernandes. “We then put an R&D agile model into practice for multiple scientific disciplines and designed for both public research organisations and experts in data preservation.” One of the outcomes of this work is the Arkivum software as a service solution for LTDP. Able to support the archiving, preserving and accessing of vast and highly valuable scientific data sets, the solution is particularly well-suited for such disciplines as astronomy, particle physics and genomics. “The service is remarkable in that it can archive and preserve up to petabyte-scale data sets in a cost-effective and environmentally sustainable manner,” notes Fernandes. Another LTDP solution developed during the project is LIBNOVA LABDRIVE. “Prior to LIBNOVA LABDRIVE, many organisations would use a siloed approach to data preservation, with each data set, department or unit using multiple, disaggregated systems,” adds Fernandes. “This new product allows everyone to keep their content in a single repository that can be easily adapted to the particularities of each data set, thus unifying all data into one platform.” Both the Arkivum and LABDRIVE solutions are accessible via the European Open Science Cloud (EOSC) platform.
Nothing short of a game changer
According to Fernandes, the services coming out of the ARCHIVER project will deliver exceptional results. They will also have a potentially significant immediate impact on at least 18 pan-European infrastructures which serve a collective 1.7 million European researchers. Then there’s the 70 million scientific, IT and other professionals who are expected to make use of them via the EOSC.
That’s a lot of people – and even more data
“ARCHIVER represents nothing short of a game changer in how to approach long-term research data management,” concludes Fernandes. “It also ensures the data meets the FAIR principles of findability, accessibility, interoperability and reusability.” Thanks to the project’s efficient R&D methodology processes, use of affordable technologies, environmental sustainability, and sensible reduction in the resources needed to archive and preserve large amounts of information, data is once again king. But it’s not just data that is wearing a crown. The ARCHIVER project itself was crowned with the 2022 Digital Preservation Award for Collaboration and Cooperation. The award celebrates the project’s significant collaboration across institutional, professional, sectoral and geographic boundaries and the demonstratable impact this collaboration has had on digital preservation.
Keywords
ARCHIVER, data, digital preservation, research, archiving, scientific data sets, long-term digital preservation, software as a service, European Open Science Cloud