Skip to main content
European Commission logo
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Molecular Dynamics Data Bank. The European Repository for Biosimulation Data

Periodic Reporting for period 1 - MDDB (Molecular Dynamics Data Bank. The European Repository for Biosimulation Data)

Période du rapport: 2023-03-01 au 2024-02-29

Molecular dynamics (MD) has transitioned from a specialised field to a widely used method across various scientific disciplines. MD simulations now actively contribute to understanding the flexibility and motions within biological macromolecules, calculating binding affinities, and predicting changes in functional properties of a molecule. However, challenges persist in effectively managing the large volumes of generated data, often due to inadequate data sharing practices and infrastructure, leading to loss of potentially transformative insights.

The MDDB project aims to address these challenges by establishing a comprehensive repository for MD simulations alongside associated analysis tools. This initiative seeks to integrate MD simulations into Life Sciences research, optimising computational resources and enhancing data analysis. The project’s primary objectives are (i) creating a centralised repository to streamline the use of supercomputing resources, ensuring efficient data storage and management; (ii) supporting extensive analysis and meta-analysis of MD trajectories; (iii) facilitating collaborations through rapid and efficient information sharing among research groups; (iv) bridging gaps between the MD field and adjacent scientific communities, thus promoting interdisciplinary research and broadening the impact of MD simulations.

MDDB will make sure MD data is Findable, Accessible, Interoperable, and Reusable (FAIR) aligning the simulation community with established data management practices in structural biology and bioinformatics. The project will implement good practices for data collection, define interoperability standards, and establish quality checking procedures. Ultimately, the MDDB project aims to promote the efficient and widespread use of MD simulations, enhancing their contribution to solving complex biological and chemical problems, and optimising existing resources by making high-quality data accessible to a broader research community, thereby leading to innovations that can benefit society at large.
During the first year of the project, MDDB has made substantial progress. Our focus in data management has been on identifying trajectory storage formats and metadata topology requirements, informed by input gathered through workshops and meetings with scientists and developers representing various institutions and tools within the MD simulation community. These discussions have led to defining requirements and desirable properties for different categories of data, including raw trajectory data compression, metadata ontologies, provenance records, and strategies for data multiplexing and retrieval. We have taken the lead on establishing community-wide standards for biomolecular data compression, aiming for greater efficiency than existing methods. Our efforts in ontology development focused on unifying simulation metadata, parameters, and molecule/force field specifications into a single format, allowing for optional inclusion based on necessity. Simultaneously, we have started developing key-value pairs for simulation parameters, making fields hierarchical corresponding to algorithms, while coordinating nomenclature in alignment with community consensus.

Regarding the technical infrastructure, we have outlined the initial technical framework that will support MDDB operations, drawing on insights from existing MD database projects (pilot use cases). We investigated specific technical requirements for particular systems and methods included in the datasets and identified requirements and possible issues, to make decisions regarding the architecture and design and development of the technical infrastructure prototype. We are using new datasets (pilot cases) to test the available software stack with new data in a federated layout. Two MDDB nodes have been set up at IRB and BSC, demonstrating that our concept of having a federated database is achievable, although still small.

Efforts towards assessing the technical feasibility have been complemented with the definition of a dissemination plan targeting end-users and other key stakeholders whose inputs and support will be key in the development and future sustainability of the MDDB infrastructure. Events and other means of engaging with MDDB will be intensified during the second year of the project. All information will be published on the project website (www.mddbr.eu).
The MDDB project has made significant progress in its first year, particularly in addressing the challenges of managing MD data. By focusing on establishing a comprehensive repository and enhancing data management practices, MDDB aims to promote the efficient and widespread use of MD simulations, ultimately leading to innovations that can benefit society at large.

One notable outcome is the advancement towards making MD data FAIR through implementing robust data collection practices, defining interoperability standards, and establishing quality checking procedures, making it readily available to the broader research community. This not only fosters collaboration but also accelerates scientific discoveries by unlocking previously inaccessible insights within MD simulations. Moreover, MDDB has taken the lead in establishing community-wide standards for biomolecular data compression, aiming to enhance efficiency beyond existing methods. This achievement streamlines data management and paves the way for further advancements in MD simulation techniques.

Moving forward, key needs to ensure further uptake and success include refining data management practices, demonstrating the efficacy of MDDB's approach in real-world scenarios, and securing the financial side of the project's sustainability. Additionally, efforts to enhance international collaboration, regulatory support, and standardization frameworks will be crucial in advancing MDDB's impact on the scientific community.

Overall, MDDB's progress in its first year demonstrates its potential to revolutionize MD simulations and contribute to solving complex biological and chemical problems. With continued support and collaboration, MDDB is confident to make significant contributions to scientific research and innovation in the years to come.
MDDB project logo