Novel data management for exascale supercomputing
As demand for exascale supercomputers becomes more widespread, operators will have to increase access and workflow capacity, allowing more users to run increasingly diverse and complex applications. Exascale systems can perform a billion billion calculations per second. Finding a way to manage and store all that data is a significant challenge, as current storage systems reach their limits and operating systems struggle to cope. “Future applications will not be able to run using current storage paradigms,” says Philippe Deniel, head of the Storage Systems Lab at the French Alternative Energies and Atomic Energy Commission. As coordinator of the IO-SEA project, Deniel has led the development and implementation of a novel software solution that offers long-term storage able to meet increasing data demands. IO-SEA is one of three SEA projects, along with DEEP-SEA and RED-SEA, set up to develop complementary technologies for a modular European high-performance computing (HPC) architecture.
Storage solutions
A key challenge for exascale computing will be the evolution of how calculations are performed. Supercomputers rely on graphics processing units (GPUs), which are designed to break complex problems into thousands of tasks to be performed simultaneously. This means they also require a lot of memory. Underlying IO-SEA’s solution (known as a software stack, as it comprises several components) are innovative uses of hierarchical storage management (HSM), object stores and ‘ephemeral’ servers. IO-SEA uses the data storage architecture known as ‘object storage’, where elements are clustered together, each containing the data, metadata and a unique identifier. HSM offers a tiered storage approach which automatically identifies the best storage media for the application at hand, whether that be non-volatile memory express (NVMe) such as solid state drives, non-volatile random-access memory (NVRAM), or even tape reels – prized in supercomputing for their low cost and low power requirements. This tiered structure ensures that frequently accessed data is kept on fast media, such as NVMe, with tape acting as a more long-term storage. “For effective HSM, it’s also important to quickly identify files,” notes Deniel. “Our advanced monitoring mechanism collects data in a large database, which our artificial intelligence system accesses to make user recommendations, based on their behaviour.” Finally, each storage server is offered on demand, dynamically scheduled to complete a computational job. Operators use a workflow management module to set up simulations, which are then automatically assigned to run on dedicated compute nodes. The results are sent to the storage system, and these servers ‘disappear’, with the nodes released for the next operation.
Shared resources
Users operate the IO-SEA system using diverse data access middleware such as POSIX, amongst other protocols. The system was tested in a number of use cases, including electron microscopy, running astrophysics programmes, climatology and Earth system modelling (partnering with DEEP-SEA), quantum physics simulations and large-scale meteorology and weather forecasting. “Throughout, we demonstrated our solution’s capability to offer a paradigm shift from storage as static and unchanging, to being conceived of a process, which is dynamic and shared,” adds Deniel. IO-SEA’s solution will be deployed as part of the EUPEX exascale prototype, to be launched within a couple of years. The software has been made freely available on the code-sharing site GitHub. The project was carried out with support from the European High Performance Computing Joint Undertaking (EuroHPC JU), an initiative set up to develop a world-class supercomputing ecosystem in Europe. “Despite being a collection of several products, our solution, co-designed by end users and system developers, introduces an integrated storage stack pointing the way forward for exascale computing,” concludes Deniel.
Keywords
IO-SEA, EuroHPC JU, exascale, HPC, supercomputing, memory, storage, resources, object store, hierarchical storage management, HSM, tape, NVMe