Periodic Reporting for period 1 - SPARCITY (SparCity: An Optimization and Co-design Framework for Sparse Computation)
Reporting period: 2021-04-01 to 2022-09-30
• to develop a comprehensive characterization mechanism for sparse
computations based on analytical and ML-based performance and energy models,
• to develop advanced node-level optimizations for sparse computation
on modern parallel architectures.
• to devise topology-aware partitioning algorithms and communication optimizations for system-level parallelism,
• to create digital SuperTwins of supercomputers to evaluate what-if hardware
scenarios,
• to demonstrate the effectiveness and usability of the SparCity framework by enhancing
the efficiency of challenging real-life applications.
• to deliver a robust, well-supported and documented SparCity framework into the hands
of end-users from industry and academia.
Overall, SparCity is a forward-looking project with a significant contribution to building
Europe’s strengths in the application of HPC and related software tools, in the adoption of low energy processing technologies, and in the development of advanced software and services for its citizens.
In WP2, groups of node-level optimizations were carried out: (i) To discover the mixed-precision opportunities in sparse computations, we proposed row-wise mixed- and multi- precision SpMV methods that are suitable for both CSR and ELLPACK-R storage formats. (ii) The evaluation of the proposed ML-based models are performed and the results demonstrate the effectiveness of the proposed models on SpMV. (iii) We propose a compiler and runtime system that takes advantage of the shared underlying processing pattern and data in order to decrease the redundant computation and data access. (iv) For memory access regularization, we developed fast and high-quality CPU- and GPU based Influence Maximization tools, HyperFuser and SuperFuser, exploiting fast vectorized instruction patterns on distributed multi-CPU and multi-GPU systems. (v) Regarding the data and computation reordering task, an extended version of the Reverse Cuthill-McKee (RCM) algorithm is implemented to handle all sparse matrices, including non-symmetric or non-square ones.
WP3 is concerned with system-level static and dynamic optimizations for sparse computations, based on the principles of balancing the computational load and minimizing the impact of communication operations. We address these issues in two ways, first by developing and applying novel partitioning algorithms and second by ensuring that the available computation and communication resources can be used most efficiently by a given application.
WP4 focuses on the design and implementation of the digital twin, SuperTwin and preprocessing library SparseBase. SparseBase reached its release v0.2; users can download and use it through https://github.com/sparcityeu/sparsebase. The main functionalities of SuperTwin, i.e. probing (see D3.1) benchmarking (see D4.1/D4.2) metric-data storage, and monitoring are now implemented in the single-node setting.
WP5 focuses on four real-life applications that use sparse data structures and/or perform sparse computations. For the simulation of cardiac electrophysiology, we created new realistic meshes, carried out code optimization, ported a simple version of the code to Graphcore IPUs, and implemented an improved numerical algorithm. For detecting wildfires on social networks, we developed methodologies to build huge networks that contain acquaintance relations among social network users, with one example as a large interaction network with more than 1.6 billion edges based on COVID-19 related conversations. For epistasis detection, we developed novel search algorithms needed for high-order detection. The corresponding high-performance implementations of the new algorithms have been developed for CPUs, GPUs and Graphcore IPUs. For autonomous driving, we implemented multiple-object tracking using Tensorflow2, involving graph neural network (GNN) layers. Furthermore, the initial version of an open repository of sparse problem instances has been created.
Since SpMV is one of the most used sparse computation kernels, the node level optimizations focused on its performance improvement. We have developed an easy mixed-precision method for SpMV and its CSR and ELLPACK-R based storage formats. While we have developed ML models that can predict and optimize SpMV execution performance, we have also identified the combinations of SpMV algorithms and CPU architectures that work well together and we have also provided a theoretical explanation for the observed performance. For the graph algorithms, we developed two CPU- and GPU-based influence maximization tools, namely HyperFuser and SuperFuser and their performance is several times faster than existing implementations, and they scale well to multiple GPUs.
The four real-world applications have all adopted Graphcore IPUs. These greatly expand the current application horizon of this ”AI processor”. Moreover, novel algorithms in the contexts of social network analysis and epistasis detection were developed. These can result in potential scientific breakthroughs. In particular, the efficiency of fourth-order epistasis detection has considerably exceeded the state of the art, outperforming it by 12.4x.
Lastly, we are advancing the state-of-the-art by developing the most comprehensive sparse preprocessing library SparseBase and performance monitoring, analysis, visualization tool,
SuperTwin. We believe these tools will shift the SotA on their line of duty and help the HPC engineers and researchers working on sparse data to do the same on their domains.