Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

MAchinE Learning for Scalable meTeoROlogy and cliMate

Periodic Reporting for period 2 - MAELSTROM (MAchinE Learning for Scalable meTeoROlogy and cliMate)

Reporting period: 2022-10-01 to 2024-03-31

To develop Europe’s computer architecture of the future, MAELSTROM has co-designed bespoke compute system designs for optimal application performance and energy efficiency, a software framework to optimise usability and training efficiency for machine learning at scale, and large-scale machine learning applications for the domain of weather and climate (W&C) science.

The MAELSTROM compute system designs test machine learning applications across a range of hardware configurations regarding energy consumption, time-to-solution, numerical precision, and solution accuracy. Customised compute systems are designed that are optimised for application needs to strengthen Europe’s high-performance computing portfolio and to pull recent hardware developments, driven by general machine learning applications, toward needs of W&C applications.

The MAELSTROM software framework enables scientists to apply and compare machine learning tools and libraries efficiently across a wide range of computer systems. A user interface links application developers with compute system designers, and benchmarking and error detection of machine learning solutions can be performed during the development phase. Tools will be published as open source.

The MAELSTROM machine learning applications cover all important components of the workflow of W&C predictions including the processing of observations, the assimilation of observations to generate initial and reference conditions, model simulations, as well as post-processing of model data and the development of forecast products. For each application, benchmark datasets with many terabytes of data are published online for training and machine learning tool-developments. MAELSTROM machine learning solutions serve as a blueprint for a wide range of machine learning applications.
Overview of project results from the beginning of the project to the end
- By the end of the project, MAELSTROM delivered one W&C machine learning application running operationally (AP1), one application demonstrated within an operational environment (AP3), and four production-ready applications (AP2, AP4, AP5, and AP6) ready to be exploited. MAELSTROM also contributed to the development of ai-models, a tool used operationally to run several emerging data-driven weather models (AP8).
- Tests have been performed regarding the use of deep learning solutions for the generation of tangent linear and adjoint model code. This code is essential for 4DVar data assimilation to generate initial conditions for weather predictions. Results show that machine learning emulators can indeed be used successfully to generate auto differentiated configurations that can be used in data assimilation.
- MAELSTROM has supported the development of AIFS, a pure machine learning forecast model that is already running at ECMWF in pre-operational mode and will be turned into a fully operational weather prediction system in the near future.
- MAELSTROM has supported the development of AtmoRep, the most promising approach to use large-scale representation learning in W&C which will likely lead to the development of a European foundation model for Earth science that will potentially revolutionise the way how W&C predictions are performed and be applicable across a wide range of W&C machine learning applications.
- MAELSTROM benchmark datasets comprise of 22 TB of data that are available for download online and well documented.
- The MAELSTROM applications were tested on more than five processors with several hardware configurations, assessing strengths and weaknesses of hardware and software. Depending on the choice of hardware, a difference of nearly one order of magnitude can be observed for energy-to-solution.
- Accelerator technology offerings followed market developments during the years of project and eight different GPU accelerators were tested during the benchmark phases.
- The two dissemination workshops that were organised at ECMWF could attract more than 200 registered participants for each of the meetings.
- The two hackathons (called MAELSTROM Bootcamps) organised by JSC and ECMWF could attract 30 and 24 onsite participants respectively.
- MAELSTROM scientists have provided more than 71 presentations and published 41 papers.
- We have designed a project webpage that provides access to all important information on the project (https://www.maelstrom-eurohpc.eu/). Including a feature on Women in Science (https://www.maelstrom-eurohpc.eu/women-in-science) and an interactive fact sheet (https://www.maelstrom-eurohpc.eu/facts).
- MAELSTROM enabled the development of the workflow platform Mantik that allows the submission of scripts to HPC infrastructure, the versioning and sharing of machine learning results with other users, as well as the tracking of training characteristics through MLflow functionalities.
MAELSTROM has contributed significantly to an improved understanding of how to design machine learning applications that are customised for the W&C domain with plenty of synergies created between the applications – e.g. between the downscaling application and the application investigating the use of crowd sourced data. This allowed for the design of improved weather and climate prediction systems that are already in pre-operational and operational use – in particular at ECMWF and MetNor. MAELSTROM has also established benchmark datasets and applications to describe machine learning workloads for the HPC community as well as new software tools that facilitate the training and inference of machine learning tools that will be used for the years to come, therefore making the W&C community more visible when designing the next generation of supercomputers. Finally, MAELSTROM has generated insight into the main performance bottlenecks when using W&C machine learning applications on state-of-the-art compute system designs.
MAELSTROM co-design cycle