Periodic Reporting for period 2 - MAELSTROM (MAchinE Learning for Scalable meTeoROlogy and cliMate)
Reporting period: 2022-10-01 to 2024-03-31
The MAELSTROM compute system designs test machine learning applications across a range of hardware configurations regarding energy consumption, time-to-solution, numerical precision, and solution accuracy. Customised compute systems are designed that are optimised for application needs to strengthen Europe’s high-performance computing portfolio and to pull recent hardware developments, driven by general machine learning applications, toward needs of W&C applications.
The MAELSTROM software framework enables scientists to apply and compare machine learning tools and libraries efficiently across a wide range of computer systems. A user interface links application developers with compute system designers, and benchmarking and error detection of machine learning solutions can be performed during the development phase. Tools will be published as open source.
The MAELSTROM machine learning applications cover all important components of the workflow of W&C predictions including the processing of observations, the assimilation of observations to generate initial and reference conditions, model simulations, as well as post-processing of model data and the development of forecast products. For each application, benchmark datasets with many terabytes of data are published online for training and machine learning tool-developments. MAELSTROM machine learning solutions serve as a blueprint for a wide range of machine learning applications.
- By the end of the project, MAELSTROM delivered one W&C machine learning application running operationally (AP1), one application demonstrated within an operational environment (AP3), and four production-ready applications (AP2, AP4, AP5, and AP6) ready to be exploited. MAELSTROM also contributed to the development of ai-models, a tool used operationally to run several emerging data-driven weather models (AP8).
- Tests have been performed regarding the use of deep learning solutions for the generation of tangent linear and adjoint model code. This code is essential for 4DVar data assimilation to generate initial conditions for weather predictions. Results show that machine learning emulators can indeed be used successfully to generate auto differentiated configurations that can be used in data assimilation.
- MAELSTROM has supported the development of AIFS, a pure machine learning forecast model that is already running at ECMWF in pre-operational mode and will be turned into a fully operational weather prediction system in the near future.
- MAELSTROM has supported the development of AtmoRep, the most promising approach to use large-scale representation learning in W&C which will likely lead to the development of a European foundation model for Earth science that will potentially revolutionise the way how W&C predictions are performed and be applicable across a wide range of W&C machine learning applications.
- MAELSTROM benchmark datasets comprise of 22 TB of data that are available for download online and well documented.
- The MAELSTROM applications were tested on more than five processors with several hardware configurations, assessing strengths and weaknesses of hardware and software. Depending on the choice of hardware, a difference of nearly one order of magnitude can be observed for energy-to-solution.
- Accelerator technology offerings followed market developments during the years of project and eight different GPU accelerators were tested during the benchmark phases.
- The two dissemination workshops that were organised at ECMWF could attract more than 200 registered participants for each of the meetings.
- The two hackathons (called MAELSTROM Bootcamps) organised by JSC and ECMWF could attract 30 and 24 onsite participants respectively.
- MAELSTROM scientists have provided more than 71 presentations and published 41 papers.
- We have designed a project webpage that provides access to all important information on the project (https://www.maelstrom-eurohpc.eu/). Including a feature on Women in Science (https://www.maelstrom-eurohpc.eu/women-in-science) and an interactive fact sheet (https://www.maelstrom-eurohpc.eu/facts).
- MAELSTROM enabled the development of the workflow platform Mantik that allows the submission of scripts to HPC infrastructure, the versioning and sharing of machine learning results with other users, as well as the tracking of training characteristics through MLflow functionalities.