Sparse matrix operations are widely used in computational science and engineering applications such as quantum chemistry and finite element analysis, as well as modern machine learning scenarios such as social network and compressed deep neural networks. The University of California, Berkeley in the famous article ‘A View of the Parallel Computing Landscape’, Asanovic et al. (2009) listed sparse matrix computations as one of the most important parallel computing patterns. In recent decades, how to use massively parallel computing platforms for highly scalable, highly performant, and highly practical sparse matrix computations has been a challenging open problem.

We have eight invited papers selected for this special issue based on a peer-review procedure, which cover several different aspects that related to architecture, algorithms and applications of high performance sparse matrix computations mentioned above.

The first part of the special issue focuses on exploring new architectural and compilation techniques for matrix computations. The two papers propose a fast matrix multiplication architecture on field programmable gate array (FPGA), and several compilation optimizations to sparse tensor algebra, respectively.

  • In the first paper, Bessant et al. (2023) propose a parallel multiplication architecture using Strassen and UrdhvaTiryagbhyam multiplier, which involves design of efficient parallel matrix multiplication with flexible implementation of FPGA devices. The architecture incorporates scheduling of blocks, operations on processing elements, block size determination, parallelization and double buffering for storage of matrix elements.

  • In the second paper, Zhang et al. (2023) present a strategy called Sgap considering segment group and atomic parallelism, which can resolve two challenges about how to elevate the flexible reduction semantics to sparse tensor algebra compilation: (1) there are wasted parallelism by adopting static synchronization granularity, and (2) static reduction strategy limits optimization space exploration. They use GPU-accelerated sparse matrix-dense matrix multiplication (SpMM) as a use case to demonstrate the effectiveness of segment group in reduction semantics elevation, and achieve obvious speedups over existing work.

The second part of the special issue focuses on the optimization techniques in sparse algorithms. The three papers propose new optimization strategies for eigenvalue problem, sparse triangular solve, and sparse approximate inverse preconditioning, respectively.

  • In the first paper, Li et al. (2022) propose a novel parallel structured divide-and-conquer (DC) algorithm for symmetric banded eigenvalue problems, denoted by PBSDC, which computes the eigenpairs directly without tridiagonalization. They compare the work with PBDC and ELPA through numerous experiments on Tianhe-2 supercomputer. For matrices with many deflations and/or small bandwidths, PBSDC can be obviously faster than the tridiagonalization-based DC implemented in LAPACK and ELPA.

  • In the second paper, Lu and Liu (2023) propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. They design two algorithm implementations, i.e., TileSpTRSV level-set and TileSpTRSV sync-free, on top of level-set and sync-free schemes, respectively. By testing a group of representative matrices, their experimental results show excellent performance compared to cuSPARSE, Sync-free and Recblock algorithms.

  • In the third paper, Gao et al. (2023) present a new heuristic sparse approximate inverse (SPAI) preconditioning algorithm on GPUs, called HeuriSPAI. HeuriSPAI fuses the advantages of static and dynamic SPAI preconditioning algorithms, and alleviates the drawback of the existing dynamic SPAI preconditioning methods not suitable for large matrices. The experimental results show that HeuriSPAI outperforms the popular preconditioning algorithms in three public libraries, i.e., cuSPARSE, MAGAMA and ViennaCL, as well as a recent parallel static SPAI preconditioning algorithm.

The third part of the special issue focuses on applications of sparse matrix computations. The first two studies propose software packages for solving large scale eigenvalue problems and sparse linear equations, respectively. The third one proposes an improved multistage preregulator for compositional reservoir simulation.

  • In the first paper, Li et al. (2023) introduce several strategies to improve the efficiency and scalability of the generalized conjugate gradient algorithm and build a package GCGE for solving large scale eigenvalue problems. This method is the combination of dam** idea, subspace projection method and inverse power algorithm with dynamic shifts. Numerical results are provided to demonstrate the efficiency, stability and scalability of this work for computing many eigenpairs of large symmetric matrices arising from applications.

  • In the second paper, Li et al. (2023) use error-free transformation technology and mixed-precision ideas to construct a reliable parallel numerical algorithm framework based on HYPRE, which solves large-scale sparse linear equations to improve accuracy and accelerate numerical calculations. Experimental results demonstrate that XHYPRE has higher reliability and effectiveness over HYPRE, and reduces the number of iterations.

  • In the third paper, Zhao et al. (2023) develop an efficient multistage preconditioner for the fully implicit compositional flow simulation. The method employs an adaptive setup phase to improve the parallel efficiency on GPUs. Furthermore, a multicolor Gauss-Seidel algorithm is applied in the algebraic multigrid methods for the pressure part. Numerical results demonstrate that the proposed method achieves good speedups while yielding the same convergence behavior.

We would like to take this chance to thank all the authors and the reviewers for their splendid contribution to this special issue of CCF THPC.