EU Public Infrastructure for the European DIgital Twin Ocean

Periodic Reporting for period 1 - EDITO-Infra (EU Public Infrastructure for the European DIgital Twin Ocean)

Periodo di rendicontazione: 2022-10-01 al 2023-12-31

The main aim of the “EU Public Infrastructure for the European Digital Twin Ocean (EDITO-Infra)” project is to build the public infrastructure backbone for the European Digital Twin of the Ocean. This will be done by upgrading, combining, and integrating key service components of the existing EU ocean observing, monitoring and data programs, namely “Copernicus Marine Service” and the “European Marine Observation & Data Network (EMODnet)” into a single digital framework. In that context, EDITO-Infra is the both the name of the project and the name of the platform that will be deployed. As such, EDITO-Infra will provide the foundation for the further development of the European Digital Twin Ocean, hosting the deployment of multiple digital twin applications from ongoing and future digital twin projects, supporting the deployment of new generation of ocean models (i.e. EDITO-ModelLab) and other related initiatives, including Horizon Europe “Mission Lighthouses” projects.

Technical activities have been focused on the development of the platform (design, development process, Implementation, Deployment and Operation, and support); on the contribution to the platform (Publication of data; Publication of processes, publication of services, and publication of tutorials and capability demonstrators of the platform); and on developing demonstrators.
The design of the platform has been developed using a Domain Driven Design to define essential domain boundaries, including managing the data interface in the Data Lake, handling remote functions and processes in the engine, and dedicating services domain for managing service interfaces. Identity and Access Management and Secret Management focuses on authentication and authorization, fostering an open and collaborative platform.
The design process also involves adhering to selected standards and conventions, including OGC STAC, API Processes, API Features, Zarr, Climate Forecast Convention, REST API, JSON API, and S3 API. Component selection includes evaluating off-the-shelf software or opting for bespoke developments when existing solutions are unsuitable.
The infrastructure has been selected in order to meet scalability and availability requirements while maintaining platform-agnostic capabilities.
The development process follows Agile methodology, considering available human resources, this encompasses specifying Agile methodology guidelines.
In the realm of API implementation, the focus extends across various domains. In the data domain, the platform efficiently organizes and retrieves data using a metadata catalog and search engine dedicated to geospatial data. Robust and scalable storage capabilities are ensured by object storage service (S3) using specific tools for AI Data Infrastructure.

Concerning processes, a custom process registry is developed, and tools are in place to streamline process management and deployment. Collaborative software development takes place on a GitLab repository and using code review tools.
The service catalog promotes modular and interoperable services relying on an open source tool.
A Viewer component enhances data visualization capabilities. Additionally, a collaborative tutorials platform enables users to share knowledge within the EDITO platform.
Identification and access management provides capabilities such as Single Sign-On (SSO). In terms of security, a secure secret management like Vault is used, safeguarding sensitive information and ensuring platform integrity and security.
This API implementation strategy creates a robust, collaborative, and secure platform, integrating various services and components to meet diverse user needs within the ecosystem.
The EDITO platform prototype is operational (deployed and working), monitored ensuring the platform resilience and open to beta-testers.
A staging environment facilitate its production tests and support ongoing development efforts.
The platform is used by ~50 beta tester and support is provided via MS Teams and email.

EDITO DataLake is designed to let users easily search, retrieve, and add data in, by enforcing a metadata framework that ensures standardized and meaningful categorization of information. Datasets from EMODnet Central portal have been successfully converted to the Zarr format, optimizing their storage and retrieval capabilities. In parallel, a compatible SpatioTemporal Asset Catalog (STAC) has been created to organize and describe these datasets. To enhance the catalog, efforts are made for providing users with a unified and comprehensive overview of available datasets. Copernicus Marine Data has been integrated into EDITO. This involves establishing a workflow for ingesting Copernicus Marine Service metadata into the EDITO data catalog. A compatible STAC catalog is created and efforts are made to merge it seamlessly. The collaborative effort includes ongoing development to refine and expand the capabilities of the EDITO catalog and associated Data Lake ensures that the platform remains dynamic, accommodating a growing range of datasets while fostering interoperability and accessibility. EDITO Data Lake allows also for users to reference, store and share their own data.
Some ocean-oriented processes for end-users and common data science and Artificial Intelligence tools for end-users have been published: Virtual R&D environment/IDE (JupyterLab, RStudio, VSCode, etc.); Databases (Postgres, S3, ElasticSearch, MongoDB, lakeFS etc.); Automation tools and workflow managers. Ocean oriented services for end-users like Autosubmit, fake HPC nodes, etc. have been publish for enabling job dispatching on EuroHPC's Barcelona SuperComputer (BSC).
Tutorials have been published (~10) to demonstrate the platform capabilities and to document the service API.
The first version of the demonstrator for running NEMO ocean general circulation model on EDITO platform has been released. Work is ongoing on the demonstrator for fisheries planning in collaboration with ILIAD.

The prototype of the EDITO-infra platform offers an innovative infrastructure to enhance interoperability and to enable the implementation of what if scenarios. EDITO Data Lake is easing access to marine data product and is going to offer a seamless access to EMODnet and Copernicus Marine Service products.

EDITO concept

EDITO-Infra infographics

EDITO-Architecture

Periodic Reporting for period 1 - EDITO-Infra (EU Public Infrastructure for the European DIgital Twin Ocean)

Condividi questa pagina

Scarica