Periodic Reporting for period 2 - NEXTGenIO (Next Generation I/O for Exascale)
Période du rapport: 2017-10-01 au 2019-09-30
One of the major roadblocks to achieving this goal is the I/O bottleneck. Current systems are capable of processing data quickly, but speeds are limited by how fast the system is able to read and write data. This represents a significant loss of time and energy in the system. Being able to widen, and ultimately eliminate, this bottleneck would majorly increase the performance and efficiency of HPC systems.
NEXTGenIO has made great strides towards solving the problem by bridging the gap between memory and storage using Intel's revolutionary new Optane DC Persistent Memory, which sits between conventional memory and disk storage. NEXTGenIO has designed the hardware and software to exploit the new memory technology, and has built a system with 100x faster I/O than current HPC systems, a significant step towards Exascale computation.
The advances that Optane DC Persistent Memory and NEXTGenIO represent are transformational across the computing sector.
1. “Hardware platform prototype: a new prototype HPC hardware platform will developed by Fujitsu utilising the latest NVDIMM and processor technology from Intel. There are many different ways of utilising this fascinating technology in computing systems. It is therefore our intention that the prototype hardware be of value not just in high-end HPC systems but also for general data centre usage. Demonstrating the prototype’s broad applicability is therefore a key objective.”
The NEXTGenIO hardware architecture was produced from a detailed requirement capture process in WP2, during the first half of the project. Following these requirements, the hardware platform was then built as part of WP6 and delivered to EPCC’s data centre in March 2019. The platform was used for in-depth testing of the co-design applications and systemware in order to demonstrate the broad applicability of the technology.
2. “Exascale I/O investigation: as the NDIMM technology is new, the project will investigate different methods of utilising its functionality to support the most efficient I/O performance in HPC and data centre environments. This technology will be truly transformative in terms of HPC workloads. Understanding how best to utilise it is a key research objective of the project.”
Investigations were done into the different modes in which the NVDIMMs can operate and the different options for their usage, e.g. direct access using fsdax and PMDK, distributed file systems, object stores or as an extended memory hierarchy. The prototype was set up in a “mixed” setup, with 1TB in Memory mode and 2TB in App Direct mode – this allows for the most efficient use of the platform overall, enabling the majority of workloads to run with only minor (SLURM supported) reconfiguration of the compute nodes. The performance of the different approaches has been tested and have been reported in the final project deliverables, in particular those from WPs 5, 6 and 7.
3. “Systemware development: the software components that will support the use of the NVDIMM technology by applications, the operating system, and the debugging and performance tools will be developed. This will include I/O libraries, new energy and data aware schedulers, enhancements to programming model libraries, and tools development. The architectural design decisions will be taken using the results of the Exascale I/O investigation and the co-design process. Producing the necessary software to enable Exascale application execution on the hardware platform is therefore a key objective.”
The project developed a systemware stack to support the seamless use of the NVRAM by applications. The SLURM scheduler was extended to be aware of the new memory/storage layer, to be able to schedule complex workflows and to make decisions on resource management that improve the efficiency of the system. A data scheduler was developer to transparently move data to and from the compute nodes. The PyCOMPSs programming model and dataClay object store were both enhanced to work with NVRAM. The profiling and debugging tools were also extended to support performance analysis on the new hardware.
4. “Application co-design: any new I/O platforms need to meet the needs of today’s highly parallel applications as these will be tomorrow’s Exascale applications. Understanding individual applications’ I/O profiles and typical I/O workloads on shared systems running multiple different applications is key to ensuring the decisions we make in hardware design and the I/O models we investigate will be relevant to the real world. The use of co-design to inform these choices is a therefore a key objective of this project.”
Co-design has been a continued strong focus throughput the project, and usage scenarios/usage requirements have driven the architecture specifications for the hardware and the software stack. A key design requirement for the systemware in particular was the need for applications to be able to benefit from the NEXTGenIO platform without requiring any changes to be made to the applications themselves. The development of the Kronos workload generator and simulator helped evaluate the architectures across multiple different workload types and to a large scale. Kronos is able to produce a wide range of synthetic workloads that represent different compute and I/O requirements and execute those with varying configurations in order to assess the impact of NVRAM on performance.
In summary, the NEXTGenIO project has firmly met all the objectives that were set out in the DoA and has exceeded expectations in terms of the overall results achieved in terms of performance and usability.