Abstract
Krylov subspace solvers such as GMRES and preconditioners such as incomplete LU (ILU) are the most commonly used methods to solve general-purpose, large-scale linear systems in simulations efficiently. Parallel Krylov subspace solvers and preconditioners with good scalability features are required to exploit the increasing parallelism provided by modern hardware fully. As such, they are crucial for productivity. They provide a high-level abstraction to the details of a complex hybrid parallel implementation which is easy to use for the domain expert. However, the ILU factorization and the subsequent triangular solve are sequential in their basic form. We use a multilevel nested dissection (MLND) ordering to resolve that issue and expose some parallelism. We investigate the parallel efficiency of a hybrid parallel ILU preconditioner that combines a restricted additive Schwarz (RAS) method on the process level with a shared memory parallel MLND Crout ILU method on the thread level. We employ the PGAS based programming model GASPI to efficiently implement the data exchange across processes. We demonstrate the scalability of our approach for the convection-diffusion problem as a representative of a large class of engineering problems up to 64 sockets (1280 cores) and show comparable baseline performance against the linear solver library PETSc. The RAS preconditioned GMRES solver achieves about 80% parallel efficiency on 1280 cores. Our implementation provides a generic, algebraic, scalable, and efficient preconditioner that enables productivity for the domain expert in solving large-scale sparse linear systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Threads and PETSc (2021). https://petsc.org/release/miscellaneous/threads/. Accessed 14 Dec 2021
Agullo, E., Giraud, L., Guermouche, A., Haidar, A., Roman, J.: MaPHyS or the development of a parallel algebraic domain decomposition solver in the course of the solstice project. In: Sparse Days 2010 Meeting at CERFACS (2010)
Aliaga, J.I., Bollhöfer, M., Martı, A.F., Quintana-Ortı, E.S., et al.: Exploiting thread-level parallelism in the iterative solution of sparse linear systems. Parallel Comput. 37(3), 183–202 (2011)
Aliaga, J.I., Bollhöfer, M., Martín, A.F., Quintana-Ortí, E.S.: Design, tuning and evaluation of parallel multilevel ILU preconditioners. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds.) VECPAR 2008. LNCS, vol. 5336, pp. 314–327. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-92859-1_28
Balay, S., et al.: Petsc users manual (2019)
Belli, R., Hoefler, T.: Notified access: extending remote memory access programming models for producer-consumer synchronization. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 871–881. IEEE (2015)
Bollhöfer, M., Saad, Y., Schenk, O.: Ilupack-preconditioning software package. Release 2 (2006). http://ilupack.tu-bs.de/
Cai, X.C., Sarkis, M.: A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J. Sci. Comput. 21(2), 792–797 (1999)
Chen, Q., Ghai, A., Jiao, X.: HILUCSI: simple, robust, and fast multilevel ILU for large-scale saddle-point problems from PDEs. Numer. Linear Algebra Appl. 28, e2400 (2021)
Chow, E., Patel, A.: Fine-grained parallel incomplete LU factorization. SIAM J. Sci. Comput. 37(2), C169–C193 (2015)
Efstathiou, E., Gander, M.J.: Why restricted additive Schwarz converges faster than additive Schwarz. BIT Numer. Math. 43(5), 945–959 (2003)
Falgout, R.D., Jones, J.E., Yang, U.M.: The design and implementation of hypre, a library of parallel high performance preconditioners. In: Bruaset, A.M., Tveito, A. (eds.) Numerical Solution of Partial Differential Equations on Parallel Computers, pp. 267–294. Springer, Berlin (2006)
Forum, G.: GASPI forum - forum of the PGAS API GASPI (2020). http://www.gaspi.de
Ghai, A., Jiao, X.: Robust optimal-complexity multilevel ilu for predominantly symmetric systems. ar**v preprint ar**v:1901.03249 (2019)
Giraud, L., Tuminaro, R.: Algebraic domain decomposition preconditioners. In: Magoules, F. (ed.) Mesh Partitioning Techniques And Domain Decomposition Methods, pp. 187–216. Saxe-Coburg Publications, Kippen (2006)
Grünewald, D., Simmendinger, C.: The GASPI API specification and its implementation GPI 2.0. In: Proceedings of the 7th International Conference on PGAS Programming Models, vol. 243 (2013)
Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T., et al.: An overview of the trilinos project. ACM Trans. Math. Softw. 31(3), 397–423 (2005)
ITWM Fraunhofer: GaspiLS - a linear solver for the Exascale Era (2020). https://www.gaspils.de
ITWM Fraunhofe: GPI-2 - Programming next generation supercomputers (2020). http://www.gpi-site.com
Karypis, G., Kumar, V.: METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Technical Report; 97-061 (1997)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
LaSalle, D., Karypis, G.: Efficient nested dissection for multicore architectures. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 467–478. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_36
Leicht, T., Jägersküpper, J., Vollmer, D., Schwöppe, A., Hartmann, R., Fiedler, J., Schlauch, T.: DLR-project digital-X-next generation CFD solver ’flucs’ (2016)
Li, N., Saad, Y., Chow, E.: Crout versions of ILU for general sparse matrices. SIAM J. Sci. Comput. 25(2), 716–728 (2003)
Prokopenko, A., Siefert, C.M., Hu, J.J., Hoemmen, M., Klinvex, A.: Ifpack2 User’s Guide 1.0. Tech. Rep. SAND2016-5338, Sandia National Labs (2016)
Rajamanickam, S., Boman, E.G., Heroux, M.A.: ShyLU: a hybrid-hybrid solver for multicore platforms. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 631–643 (2012). https://doi.org/10.1109/IPDPS.2012.64
Ram, R., Grünewald, D., Gauger, N.R.: Data structures to implement the Sparse Vector in Crout ILU preconditioner (2019), Sparse Days 2019
Simmendinger, C., Rahn, M., Gruenewald, D.: The GASPI API: a failure tolerant PGAS API for Asynchronous Dataflow on heterogeneous architectures. In: Resch, M., Bez, W., Focht, E., Kobayashi, H., Patel, N. (eds.) Sustained Simulation Performance 2014, pp. 17–32. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-10626-7_2
Stoyanov, D., Pfreundt, F.J.: Hybrid-parallel sparse matrix-vector multiplication and iterative linear solvers with the communication library GPI. WSEAS Trans. Inf. Sci. Appl. 11 (2014)
Yamazaki, I., Ng, E., Li, X.: Pdslin user guide. Tech. rep., Lawrence Berkeley National Lab. (LBNL), Berkeley, CA, USA (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Ram, R., Grünewald, D., Gauger, N.R. (2022). Hybrid Parallel ILU Preconditioner in Linear Solver Library GaspiLS. In: Varbanescu, AL., Bhatele, A., Luszczek, P., Marc, B. (eds) High Performance Computing. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13289. Springer, Cham. https://doi.org/10.1007/978-3-031-07312-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-07312-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07311-3
Online ISBN: 978-3-031-07312-0
eBook Packages: Computer ScienceComputer Science (R0)