Periodic Reporting for period 1 - EDITO-Infra (EU Public Infrastructure for the European DIgital Twin Ocean)
Periodo di rendicontazione: 2022-10-01 al 2023-12-31
The design of the platform has been developed using a Domain Driven Design to define essential domain boundaries, including managing the data interface in the Data Lake, handling remote functions and processes in the engine, and dedicating services domain for managing service interfaces. Identity and Access Management and Secret Management focuses on authentication and authorization, fostering an open and collaborative platform.
The design process also involves adhering to selected standards and conventions, including OGC STAC, API Processes, API Features, Zarr, Climate Forecast Convention, REST API, JSON API, and S3 API. Component selection includes evaluating off-the-shelf software or opting for bespoke developments when existing solutions are unsuitable.
The infrastructure has been selected in order to meet scalability and availability requirements while maintaining platform-agnostic capabilities.
The development process follows Agile methodology, considering available human resources, this encompasses specifying Agile methodology guidelines.
In the realm of API implementation, the focus extends across various domains. In the data domain, the platform efficiently organizes and retrieves data using a metadata catalog and search engine dedicated to geospatial data. Robust and scalable storage capabilities are ensured by object storage service (S3) using specific tools for AI Data Infrastructure.
Concerning processes, a custom process registry is developed, and tools are in place to streamline process management and deployment. Collaborative software development takes place on a GitLab repository and using code review tools.
The service catalog promotes modular and interoperable services relying on an open source tool.
A Viewer component enhances data visualization capabilities. Additionally, a collaborative tutorials platform enables users to share knowledge within the EDITO platform.
Identification and access management provides capabilities such as Single Sign-On (SSO). In terms of security, a secure secret management like Vault is used, safeguarding sensitive information and ensuring platform integrity and security.
This API implementation strategy creates a robust, collaborative, and secure platform, integrating various services and components to meet diverse user needs within the ecosystem.
The EDITO platform prototype is operational (deployed and working), monitored ensuring the platform resilience and open to beta-testers.
A staging environment facilitate its production tests and support ongoing development efforts.
The platform is used by ~50 beta tester and support is provided via MS Teams and email.
EDITO DataLake is designed to let users easily search, retrieve, and add data in, by enforcing a metadata framework that ensures standardized and meaningful categorization of information. Datasets from EMODnet Central portal have been successfully converted to the Zarr format, optimizing their storage and retrieval capabilities. In parallel, a compatible SpatioTemporal Asset Catalog (STAC) has been created to organize and describe these datasets. To enhance the catalog, efforts are made for providing users with a unified and comprehensive overview of available datasets. Copernicus Marine Data has been integrated into EDITO. This involves establishing a workflow for ingesting Copernicus Marine Service metadata into the EDITO data catalog. A compatible STAC catalog is created and efforts are made to merge it seamlessly. The collaborative effort includes ongoing development to refine and expand the capabilities of the EDITO catalog and associated Data Lake ensures that the platform remains dynamic, accommodating a growing range of datasets while fostering interoperability and accessibility. EDITO Data Lake allows also for users to reference, store and share their own data.
Some ocean-oriented processes for end-users and common data science and Artificial Intelligence tools for end-users have been published: Virtual R&D environment/IDE (JupyterLab, RStudio, VSCode, etc.); Databases (Postgres, S3, ElasticSearch, MongoDB, lakeFS etc.); Automation tools and workflow managers. Ocean oriented services for end-users like Autosubmit, fake HPC nodes, etc. have been publish for enabling job dispatching on EuroHPC's Barcelona SuperComputer (BSC).
Tutorials have been published (~10) to demonstrate the platform capabilities and to document the service API.
The first version of the demonstrator for running NEMO ocean general circulation model on EDITO platform has been released. Work is ongoing on the demonstrator for fisheries planning in collaboration with ILIAD.