Periodic Reporting for period 2 - eFlows4HPC (Enabling dynamic and Intelligent workflows in the future EuroHPCecosystem)
Periodo di rendicontazione: 2022-07-01 al 2024-02-29
After three years of dedicated research and collaborative efforts, the eFlows4HPC project has successfully delivered a comprehensive workflow platform and an additional set of services facilitating the seamless integration of HPC simulation and modeling with big data analytics and machine learning processes.
The two major project outcomes, HPC Workflows as a Service (HPCWaaS) and the project software stack, are crucial in supporting the development, deployment and execution of complex workflows, including new intelligent approaches to make the best use of the computational resources. These advancements play a key role in tackling challenges related to natural disasters, climate change and the optimization of manufacturing processes.
The project has designed and developed the HPCWaaS methodology. Some important components of the HPCWaaS methodology are provided as services, such as the data catalogue service which links to project data sets, the Data Logistic service, able to move data from external repositories to the HPC systems based on data pipelines, or the Container Image Creation service, able to generate container images for specific hardware.
The project identified from the pillar applications computational and artificial intelligence kernels with performance bottlenecks. A selection of these kernels was optimized for GPUs, FPGAs and in particular for the European Processor ecosystem. The project has also researched the use of new storage devices such as NVRAMs as an alternative for persistent data storage instead of traditional disks.
Pillar I, aimed at developing digital twins for manufacturing, with a workflow developed on top of the eFlows4HPC software stack that allows to run an end-to-end full reduction model of the cooling system of a SIEMENS electrical engine. The final version integrates real physics from heat transfer in electric motors, validated by comparing the results with commercial codes from SIEMENS. The Pillar has achieved tremendous success with speedups of the order of 50x-100x and a large impact on memory efficiency. The partner CIMNE is following an innovation path marketed as SimTwins to commercialize the pillar results and won an innovation award.
Pillar II focused in the development of two workflows: the Dynamic (AI-assisted) Earth System Model (ESM) workflow that aims at pruning ensembles runs based on runtime analytics; and the Statistical analysis and feature extraction workflow aiming at the prediction of tropical cyclones. The ESM workflow is developed on top of the software stack, including PyCOMPSs for workflow orchestration and Hecuba for data management. It has been deployed in multiple supercomputers, such as MareNostrum 4 (BSC) and Levante (DKRZ). The feature extraction workflow integrates advanced features from PyCOMPSs to manage streamed data. The application of this workflow to real data demonstrated its ability to track tropical cyclones in the North Pacific.
Pillar III aimed at developing workflows for earthquakes (UCIS4EQ) and for is subsequent tsunamis (FTRT/PTF). The two workflows have been designed and developed on top of the project software. The Pillar has also developed MLESmap, a novel procedure that exploits the predictive power of ML algorithms to estimate ground acceleration values a few seconds after a large earthquake occurs. Thanks to these workflows, the partners have been able to perform very significant studies such as predicting high-resolution inundation maps from tsunami simulations, or new validation studies of the Puebla and Kos-Bodrun earthquakes. In addition, the pillar has generated a white paper with recommendations for urgent computing for natural hazards.
All project software (software stack, HPCWaaS methodology and developed workflows) is available as open source in public repositories and with online documentation.
The project has done large efforts on dissemination, training and technology transfer. The project partners have published 32 articles in journals and conferences. Up to 8 training courses have been delivered, all of them but one targeting external participants. The project has organized four workshops focusing on user communities, with 10 scientific communities/initiatives engaged.
The project has selected 14 Key Exploitable Results from a total of 47 project results initially listed, based on the degree of innovation, on the exploitability and on impact of each result. Most of the KERs are under exploitation in follow-up projects or by the partners themselves. Most of the workflows are sustained by the user community and one of them (Pillar I) is following an innovation path towards commercialization
The project has defined a set of complex workflows that leverage these technologies in scientific and industrial applications and delivered optimization of specific application kernels targeting modern HPC architectures. This has enabled the generation of new scientific results in the areas of research of the pillars.
The social impact of the project includes the support for a wider adoption of HPC by new communities, through automated deployment and execution of workflows; the reduction on time to react on a natural hazard and improved early warning methodologies; better prediction of Tropical Cyclones; contributions towards DestinE Digital Twins and improved mitigation and adaptation strategies for future climate change.
The economic impact of the project relates to improved efficiency of HPC infrastructures by means of dynamic and intelligent approaches; more optimal use of computational resources reducing carbon and energy footprints and enhancement of the portability of codes between systems.
There is also industrial impact by promoting the European technology, providing solutions for the design of digital twins in the industry which are under commercialization, faster time to solution for industry and improvement on manufacturing processes.