Abstract
In this paper, a parallel strategy for assembly of finite element matrices on graphics processing unit (GPU) is presented. Considering the limited memory size of a GPU, the proposed strategy doesn’t store the elemental matrices into memory but performs on-the-fly computation and stores the data directly into a global stiffness matrix, reducing memory requirement and preventing overhead due to a separate assembly step. The global stiffness matrix is stored in compressed sparse row (CSR) storage format, commonly used by GPU-accelerated linear solver libraries. However, the assembly of elemental matrices directly into a sparse storage format requires prior knowledge of locations of nonzeros. The current work presents an efficient strategy to pre-compute indices for assembly into CSR sparse storage format. The proposed strategy has been implemented on both CPU and GPU. The performance characteristic of the proposed finite element solver is measured by solving large-scale three-dimensional (3D) elasticity problem involving a maximum of 4.7 million degrees of freedom (DOFs). A comparison is made with the standard assembly implementation in Eigen C++ library, which first stores the nonzero values in the form of triplets and then assembles into CSR sparse format. For the finest mesh with 4.7 million DOFs, the proposed CPU-based assembly strategy achieves 9.3× speedup over Eigen library. The computation of indices for assembly into CSR format takes 15.7 s on CPU and 2.4 s on GPU for 4.7 million DOFs. The computation of elemental matrices and their assembly, implemented on GPU as a single compute kernel, is found to be up to 24.3× faster than optimized CPU implementation. In terms of wall-clock time, the GPU-accelerated finite element solver is found to have up to 4× speedup over CPU solver.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zienkiewicz OC, Taylor RL, Zhu JZ (2005) The finite element method: its basis and fundamentals, 6th edn. Butterworth-Heinemann, Oxford
Georgescu S, Chow P, Okuda H (2013) GPU acceleration for FEM-based structural analysis. Archiv Comput Methods Eng 20(2):111–121
Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw (TOMS) 43(4):1–49
Kiran U, Sanfui S, Ratnakar SK, Gautam SS, Sharma D (2019) Comparative analysis of GPU-based solver libraries for a sparse linear system of equations. In: Advances in computational methods in manufacturing. Springer, Singapore, pp 889–897
Kiran U, Gautam SS, Sharma D (2020) GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices. Computing 102(9):1941–1965
Kiran U, Agrawal V, Sharma D, Gautam SS (2019) A GPU based acceleration of finite element and isogeometric analysis. In: Liu GR, **angguo GX (eds) Proceedings at the 10th international conference on computational methods (ICCM2019). ScienTech Publisher, Singapore, pp 641–651
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. Nvidia Technical Report NVR-2008-004, Nvidia Corporation
Guennebaud G, Jacob B (2021) Eigen V3, http://www.eigen.tuxfamily.org
The MathWorks. Inc. (2021) MATLAB version R2021a. Natick, Massachusetts
Dziekonski A, Sypek P, Lamecki A, Mrozowski M (2012) Finite element matrix generation on a GPU. Progress Electromagn Res 128:249–265
Sanfui S, Sharma D (2017) A two-kernel based strategy for performing assembly in FEA on the graphics processing unit. In: International conference on advances in mechanical, industrial, automation and management systems (AMIAMS), pp 1–9. IEEE
Kiran U, Sharma D, Gautam SS (2019) GPU-warp based finite element matrices generation and assembly using coloring method. J Comput Des Eng 6(4):705–718
Sanfui S, Sharma D (2020) A three-stage graphics processing unit-based finite element analyses matrix generation strategy for unstructured meshes. Int J Numer Meth Eng 121(17):3824–3848
NVIDIA Corporation. NVIDIA CUDA C++ programming guide, version 11.6 (2022)
Dalton S, Bell N, Olson L, Garland M (2014) Cusp: generic parallel algorithms for sparse matrix and graph computations. version 0.5.0, http://cusplibrary.github.io
Acknowledgements
This work was supported by the Science and Engineering Research Board [IMP/2019/000276, SB/FTP/ETA- 0008/2014].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kiran, U., Gautam, S.S., Sharma, D. (2023). Accelerating Finite Element Assembly on a GPU. In: Sharma, R., Kannojiya, R., Garg, N., Gautam, S.S. (eds) Advances in Engineering Design. FLAME 2022. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-99-3033-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-3033-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3032-6
Online ISBN: 978-981-99-3033-3
eBook Packages: EngineeringEngineering (R0)