Skip to main content
European Commission logo
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Pilot using Independent Local & Open Technologies

Periodic Reporting for period 1 - The European PILOT (Pilot using Independent Local & Open Technologies)

Periodo di rendicontazione: 2021-12-01 al 2022-09-30

The European PILOT project (EUPILOT) is creating accelerators –designed, implemented, manufactured, and deployed in Europe– to power pre-exascale systems.

The EUPILOT project aims to build an end-to-end demonstrator of accelerators that could be used in a pre-exascale system. The project will produce three chip tapeouts. The first will be a test chip to validate the use of the 12nm technology node. The second and third will contain a vector accelerator with up to 16 cores and a machine learning and stencil accelerator with up to eight cores, respectively.

Chips will be mounted along LPDDR memory into modules and these will be installed into accelerator boards making systems, and, paired with host servers, deployed into liquid immersion tanks. These tanks support ultra-efficient power densities and are a trending technology for the future of HPC.

EUPILOT contributes to a sustainable exascale HPC ecosystem in Europe, helping lay the groundwork for long-term technical independence by delivering an end-to-end proof of concept. The know-how and the boost in industrial competitiveness and closer cooperation will all help the goal of establishing European digital autonomy.

Hardware-wise, EUPILOT leverages and significantly scales up advancements made within EPI’s accelerator (EPAC), in the form of the deeper scaling/integration of the HPC vector accelerator.
Evaluation and benchmarking of various applications and computational methods for potential acceleration with VEC and MLS chips in the EUPILOT platform has started. Two applications have started to be characterized: GROMACS and a set of dwarfs derived from EC-EARTH. The goal of related work is also to accelerate drug discovery processes considerably.

The efforts in AI libraries have been devoted to the implementation of the oneDNN library and its optimization. The team has been working on the implementation of BLIS kernels for MLS. Another area of focus has been implementing some oneDNN kernels on the Verilator-based SDV. The team has successfully implemented a version of oneDNN targeting the VEC accelerator. The current implementation the ResNet model’s performance is around 40% of what can be expected.

In terms of AI frameworks, converting ONNX models to DaCe and an initial version of the MLS backend has been released as v0.14 and a paper on it has been presented in ISC 2022. The team has been developing a memory management solution and a version of TensorFlow that dynamically links with oneDNN, to feed Arax. Arax has been ported to a RISC-V QEMU environment. The team has provided a oneDNN library optimised for RISC-V to assist the integration to TensorFlow. Work has been started on integrating Tarantella with DaCe/TensorFlow.

Co-design work has been performed to start developing tests for verifying OpenMPI's data transfer engine's (DTE) functionality. The team has worked towards the final goal to port and optimise the OpenMP runtime for VEC, with a focus on locality awareness and better energy efficiency.

Effort has been devoted to develop the first version of the TAMPI library that manages all concurrent MPI requests internally.

Work was done on node- and cluster-level resource management, based on the DROM and BBUE. The team has worked on porting recent Linux kernel and root file system.

In terms of tools, the team has been working on integrating the Fortran front-end of LLVM with the EPI compiler to pave the way to vectorisation.

The hardware team focused on two main areas in parallel.

The development of the work necessary for the upcoming tapeout of a so-called testchip was fully under way. The purpose of this testchip is to perform characterization/debugging of all critical structures whose IP will be used in future chips using the GlobalFoundries 12LP+ node.

Work was performed for the definition and implementation of details of the uncore, with the C2C controller, LPDDR controller, and the CXL controller (with their corresponding PHYs) in addition to the power management controller, PLLs, etc. Most of this work will be included in the testchip.

Architectural work was performed for the accelerators. Application and library suitability have been determined for the co-design efforts (e.g. EC-Earth, GROMACS, quantum chemistry workflow, AI video processing, BLAS, etc.).

In terms of the memory hierarchy, design work was performed for cache improvements and feature upgrades in the intra-chip coherency mechanisms. In the RISC-V/VEC cores, performance increases can be expected from a 4x increase in handling outstanding misses. Work was also performed to extend the AMBA5 CHI. The first interface specification for the I/O coherent data-transfer engine (DTE) was created, along with the DMA engine.

Work was performed to improve the VEC core from a 2-way in-order design to a 3-way out-of-order core. The interface between the core and the VPU (OVI) was improved to version 2, with changes in the core and the VPU. There are also improvements in the NoC of virtual channels, enabling inter-chip routing.

On the MLS side, improvements have been performed in the integration of the SPU to the snitch integer core, memory-mapping of the SPU and further integration improvements.

Verification efforts have been devoted to transiting from version 0.7 to version 1.0 of the V extension. Effort has been started in the multi-FPGA environments with C2C protocol extensions.

In the systems area, work for the development of the testboard that will host the testchip has started, and the system specifications and definition of the requirements has started.

Finally, specifications for the deployment and operation of liquid immersion cooling tanks were gathered and a site survey performed for the deployment location at the BSC datacenter.
EUPILOT will deliver the first vectorizing compiler for the RISC-V vector extension that fully exploits the vector-length agnostic features of the ISA.

EUPILOT will provide implementations of relevant AI/ML frameworks, leveraging a pool of European technologies (e.g. DaCe and Tarantella), that will enable a broad class of applications.

The improvements on the prior VPU will be substantial, in addition to compatibility with RISC-VV v1.0. The VEC core will improve and both accelerator chips will bring out performance/energy efficiency gains as well as increased memory capacity and potentially higher memory throughput. The NoC will be improved to match the increase in memory bandwidth. The CXL controller brings improvements in latency and throughput.

The availability of computing platforms like EUPILOT, bringing HPC, AI/ML accelerated workloads to future exascale systems has the potential to impact the world in remarkably positive endeavors like faster drug discovery, protein folding and in general shorter times to scientific discoveries. These have the potential to save billions and many years of development across many fields. These efforts aim for large-scale societal improvements by enabling users to model what was before difficult to physically perform, to ameliorate the impact of detrimental or catastrophic events upon society at large.
EUPILOT: Top Level View