Periodic Reporting for period 2 - SCALABLE (SCAlable LAttice Boltzmann Leaps to Exascale)
Berichtszeitraum: 2022-07-01 bis 2023-12-31
In SCALABLE, eminent industrials and academic partners will team up to achieve the scaling to unprecedented performance, scalability, and energy efficiency of an industrial LBM-based computational fluid dynamics (CFD) software.
Over the last 10 years, Lattice Boltzmann methods (LBM) have evolved to become trustworthy alternatives to conventional CFD. LBM methods are also flexible so that they can be extended to handle complex, dynamically changing geometries, multiphase flows, and wide range of other multiphysics applications that are of high industrial relevance.
In the context of EuroHPC, the distinguishing critical feature of the LBM is the algorithmic locality stemming from an explicit time step. This makes the LBM especially well-suited to exploit advanced supercomputer architectures.
WaLBerla performance excels because of its uncompromising unique, architecture-specific automatic generation of optimized compute kernels, together with carefully designed parallel data structures. waLBerla, however, is not compliant with industrial applications.
On the other hand, the industrial CFD software LaBS already has such industrial capabilities at a proven high level of maturity, but it still has performance worthy of improvement. Therefore, SCALABLE will transfer the leading-edge performance technology from waLBerla to LaBS. The collaboration will deliver improved efficiency and scalability for LaBS to be prepared for the upcoming European Exascale systems.
The project outcomes will be disseminated through the LaBS software and will directly benefit to the European industry. Additionally, SCALABLE will also contribute to fundamental research. This will include energy efficient computing, GPU accelerated kernels, and a novel memory efficient sparse data structure available as open-source software within the waLBerla framework.
Objectives
The primary goal of SCALABLE is to develop an industrial LBM-based CFD solver capable to exploit current and future extreme scale architectures, expanding current capabilities of existing industrial LBM solvers by at least two orders of magnitude in terms of processor cores and in terms of lattice cells.
The ultimate goal is then to fundamentally enhance the predictive capabilities of industrial CFD software by making it usable on pre-exascale supercomputers for industrial class of applications.
The 1st objective of the project on which we focused with the partnership of CS GROUP, CERFACS, IT4I and ERLANGEN allowed us to identify the most appropriate test cases to highlight the locks and differences of the 2 Lattice Boltzmann software.
The POP study has therefore highlighted the calculation steps on which the Walberla and Labs code have locks. As expected, the locks were not the same and the 2 software were able to learn from this first inventory.
The POP study is also a real plus concerning energy consumption and has proven that the different methods of Tunning Static make it possible to save consumption while maintaining correct calculation times.
For the 2nd objective, the collective work of JUELISH, CS GROUP, ERLANGEN, CERFACS and IT4I the locks detected via the POP study and the comparison of the 2 software have made it possible to identify on LaBS the memory stages of the pre-processing that were problematic. Some of the pre-processing steps have been re-worked and the memory footprint reduction objective has been achieved. The memory peaks are now equivalent between the pre-processing steps and the Solver step. There remains a point of attention when using single precision and we have areas for improvement.
Then, the general calculation performances are being improved and several things have been tested. Hybridation strategy based on MPI / OpenMP approach has been developed and tested on simplified cases. A prototype of the LaBS project on ARM processors and GPUs is being evaluated and the 2nd part of the project will assess the gains obtained through these technologies. Manufacturers such as AIRBUS and RENAULT regularly renew their HPC clusters and are very attentive to the results on these new architectures. They could be led to modify their production clusters to improve their competitiveness via these new approaches.
For the 3rd objective, the work carried out by ERLANGEN, JUELISH, CS GROUP has improved the performance of LABS. Calculations with rotating domains is no longer a problem and manufacturers can now add them without hesitation.
Work on automatic code generation is a success and now makes it easier to test LBM algorithms.
Regarding the 4th objective, with the first results shown, new manufacturers such as SAFRAN Tech or Airbus D&S are now interested in LaBS.
• For Walberla this will provide the ability to do more generic computing. During the low-TRL phases, the assessment of physical and numerical models will be easier on more complex configurations thanks to these improvements. As a consequence the conclusions and orientations on each tested models will be more robust : the step between low-TRL developements and higher TRL validations will be reduced.
• For Labs, thanks to the new data structure and optimized approaches, we are moving towards pre-exascale CPU/GPU HPC structures with exceptional solver performance both in terms of precision and speed.
In Labs, the first test on a data-structure change inspired by the Walberla data structure having not demonstrated better performance, we were inspired by Walberla's methods and algorithms to speed up calculations and strongly reduce the data exchanges between processors.
Then, in cooperation between CS GROUP and CERFACS, we lifted locks allowing us to increase beyond 8000 processors. We are currently working on the management of the output data which appears now to be the next bottleneck for very large number of MPI processes. An important step of parallelization will allow us to overcome this performance bottleneck. It will then be possible in a single calculation with very large number of MPI processes to request larger and more precise outputs.
Thanks to IT4I and CERFACS, the 2 codes should benefit from the improvements made by the dynamic tuning of processors to reduce energy consumption. This consumption saving will benefit both industrial manufacturers and laboratories on all types of calculations.
The work carried out by ERLANGEN will make it easier to adapt to new architectures and thus facilitate future developments. In particular, it will be possible to test many different models, for example on boundary conditions, turbulence and collision models. All these algorithms will benefit both industrial and fundamental research.
The Walberla and Labs codes have been installed on the new European clusters and a battery of tools has been made available to analyze the operation of the codes on these new clusters. The LUMI, KAROLINA and JUWELS calculation clusters are a real plus for this project. It is remarkable to have access to ARM and GPU clusters.
These new architectures will make it possible to go further in the development of open-air type aircraft rotors or to carry out simulations on hydrogen safety.