A new approach to digital content preservation
The increasing digitalisation of content is a two-sided coin. Sure, it enables instant access to a dizzying amount of information. But this comes at a considerable cost: unlike books, digital content is mostly short-lived. The digital ecosystem evolves so fast – with changes in the likes of policies, legal frameworks, professional practices, user expectations and behaviour, or semantics – that long-term access to content cannot always be guaranteed. This is where the concept of ‘preservation by design’ advocated by the PERICLES (Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics) project comes into play. ‘“Preservation by design” means that preservation is not an “afterthought” or an action that comes after the end of the “active life” of a digital object,’ says Dr Marc Hedges, senior lecturer at the Department of Digital Humanities at King's College London and coordinator of PERICLES. ‘This post-custodial approach implies pro-active and not simply reactive management of digital assets, where preservation is part of the active life of an object.’ Under such approach, an object is considered ‘alive’, passing through different phases, or having multiple lives as long as it has not been fully deleted. ‘Preservation by design would be part of the data’s complete lifetime, from creation to deletion,’ Dr Hedges points out. Beyond technical preservation Existing approaches to preservation would typically focus solely on the technical environment necessary for long-term archiving. Whilst literature and standards underline that the purpose of such archives is not only technical but also organisational, institutions tend to overemphasise technical aspects. PERICLES proceeds differently: it investigates how changes to any element of the environment – be it user communities, the institution itself, or the larger social and cultural context – affects the usefulness and interpretation of the digital object, and how such change can be managed. ‘A goal of PERICLES was to investigate the hypothesis that, if we can capture the dependencies of a specific object within its ecosystem and implement the resulting models and ontologies into an infrastructural layer, we would be able to analyse the impact of change on the access and reuse of an object and take appropriate mitigating action,’ says Dr Hedges. To this end, the team developed an integration framework and architecture. Its point is to demonstrate workflows and components that would allow for the introduction of a ‘change management’ layer into existing repository systems. Test-beds were delivered to verify the validity of the approach under different scenarios. ‘Components that we could not find on the market, we developed ourselves;’ Dr Hedges explains. ‘Examples include domain ontologies and ecosystem models, a significant environment information extractor and other tools to populate the ontologies, an entity registry combined with a model repository, a process compiler, an appraisal tool, a policy editor, etc.’ Other components can also be considered as long as they fulfil the roles defined by PERICLES’ integration framework. To validate their approach, Dr Hedges and his team needed to use complex data and focused on two domain-ontologies: digital art and media, and space science. ‘The institutions managing this data seem to be at opposite extremities in terms of responsibilities. TATE aims to preserve their digital art collections and bring along an exceptional conservation expertise, whilst space operation centre B.USOC represents a sector where preservation is mostly restricted to storage. On the other hand, on ‘re-use’ is not a common concept in art, whilst this notion has grown rapidly within the science domain,’ says Dr Hedges. Among the project’s most exciting outcomes, Dr Hedges mentions the understanding that knowledge related to the dependencies of an object and its ecosystem results from the combination of diverse expertises. Translating that knowledge into semantic models and ontologies, understandable both by humans and by machines, allows for a new form of collaboration between the two. ‘This calls for a new alliance between the human professional and the technological system, with the human actor taking on an active role, such as informing the models and making decisions based on the system,’ Dr Hedges enthuses. ‘We are far away from the vision of technology as a mistrusted friend or the magic wand that does it all on its own.’ Looking ahead Whilst the research in its current state is still highly experimental, airlines, hospitals, financial and governmental organisations handling various digital assets and wanting to keep them accessible over a long period will certainly be interested in PERICLES research. In April, the team will be publishing a white paper delineating their approach to other organisations beyond those primarily engaged with preservation, but still facing challenges in this regard. ‘In parallel we have been working on creating a portal called PRESERVEWARE – a digital preservation hub – that would help people to find appropriate preservation tools, including those that PERICLES produced,’ Dr Hedges explains. ‘We looked at existing registries, and, although they are excellent, we believe that the better the field is covered the higher the chance that professionals looking for tools will actually find them.’
Keywords
PERICLES, digital content, data preservation, preservation by design, semantics, digital art, space science, repository, PRESERVEWARE