Helping next-gen supercomputers keep their cool
While exascale supercomputers promise to accelerate scientific discovery through unprecedented computational power (performing over a billion billion calculations per second), they also promise to consume a colossal amount of energy – and produce a great deal of heat in the process. “Certain technology gaps still need to be bridged,” explains TEXTAROSSA project coordinator Massimo Celino from the Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA). “These gaps include achieving energy efficiency and thermal control, as well as computational efficiencies. New methods and tools for seamlessly integrating accelerators in high-performance computing multi-node platforms are also needed.”
Supercooling supercomputers
Unlike your computer at home, multi-node exascale systems distribute responsibilities across multiple independent nodes, each of which can be a complete computer in itself. This architecture is designed to optimise the performance, scalability and availability of services and applications. The TEXTAROSSA project has developed a range of tools and technologies that can be integrated into these exascale platforms. The project was carried out with support from the European High Performance Computing Joint Undertaking (EuroHPC JU), an initiative set up to develop a world-class supercomputing ecosystem in Europe. “A key technology we developed was an innovative new cooling system,” says Celino. Due to the intense concentration of circuity in thousands of central processing units (CPUs), high-performance computing (HPC) systems generate huge amounts of heat. Without adequate measures to remove that heat – such as housing computers in a climate-controlled room – they will become so hot that they eventually fail. These cooling strategies can also be hugely energy-intensive themselves. While home computers typically use air fans to cool chips, the TEXTAROSSA project went for a more efficient material. “An efficient two-phase cooling technology was developed, where a special fluid is pumped through a closed-loop circuit, which goes around the CPUs,” explains Celino. Working like a refrigerator, the fluid undergoes phase changes, turning from a liquid to a vapour, and then back again, to efficiently absorb the heat from the CPUs and carry it safely away. Since more heat is generated as the computer works harder, the cooling technology was then coupled with an automated thermal control strategy, which controls the intensity of the cooling according to the performance of the CPUs.
Smaller is better
The project team also developed new system software, designed to better manage computational tasks and achieve increased operational efficiencies. “This software offers greater speed in data management and movement, and the seamless management of computational tasks to optimise their distribution over the CPUs,” adds Celino. “This enables users to exploit impressive computing power in large parallel runs.” The next step of the project was to develop two different computational node architectures, to be used in future HPC infrastructures. Two prototypes were created using different types of state-of-the-art CPUs, with both exploiting the tools developed through the TEXTAROSSA project. Several applications, such as exploiting AI and high-performance data analytics, were run on these prototypes. This helped the project team to measure any increased computational and energy efficiencies achieved. “We wanted to assess whether our techniques could be a useful means of ensuring that supercomputers don’t overheat, while still being able to deliver computational capacity for real-life applications,” notes Celino. The results confirmed the feasibility of both prototype systems, the efficacy of individual software tools, and the effectiveness of the cooling technology. “The University of Turin asked for a prototype platform to be built, so we actually now have three,” says Celino. “The challenge now is to further improve these prototypes and work on the miniaturisation and engineering of the technology.”
Keywords
TEXTAROSSA, EuroHPC JU, computing, heat, efficiency, cooling, fluid, phase change, exascale, supercomputers, HPC, CPU