Abstract
A new direct linear equation solver is proposed for GPUs. The proposed solver is applied to mechanical system analysis. In contrast to the DFS post-order traversal which is widely used for conventional implementation of supernodal and multifrontal methods, the BFS reverse-level order traversal has been adopted to obtain more parallelism and a more adaptive control of data size. The proposed implementation allows solving large problems efficiently on many kinds of GPUs. Separators are divided into smaller blocks to further improve the parallel efficiency. Numerical experiments show that the proposed method takes smaller factorization time than CHOLMOD in general and has better operational availability than SPQR. Mechanical dynamic analysis has been carried out to show the efficiency of the proposed method. The computing time, memory usage, and solution accuracy are compared with those obtained from DSS included in MKL. The GPU has been accelerated about 2.5–5.9 times during the numerical factorization step and approximately 1.9–4.7 times over the whole analysis process, compared to an experimental CPU device.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig11_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig12_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig13_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig14_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig15_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig16_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11044-017-9588-1/MediaObjects/11044_2017_9588_Fig17_HTML.gif)
Similar content being viewed by others
References
TOP 500: http://www.top500.org/
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008). doi:10.1109/jproc.2008.917757
Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003). doi:10.1137/1.9780898718003
Lukash, M., Rupp, K., Selberherr, S.: Sparse approximate inverse preconditioners for iterative solvers on GPUs. In: Proceedings of the 2012 Symposium on High Performance Computing (2012)
Serban, R., Melanz, D., Li, A., Stanciulescu, I., Jayakumar, P., Negrut, D.: GPU-based preconditioned Newton–Krylov solver for flexible multibody dynamics. Int. J. Numer. Methods Eng. 102(9), 1585–1604 (2015). doi:10.1002/nme.4876
Naumov, M.: Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS. Nvidia white paper (2011)
Wong, J., Kuhl, E., Darve, E.: A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. Int. J. Numer. Methods Eng. 102(12), 1784–1814 (2015). doi:10.1002/nme.4865
Rodrigues, A.W.D.O., Guyomarch, F., Menach, Y.L., Dekeyser, J.L.: Parallel sparse matrix solver on the GPU applied to simulation of electrical machines. ar**v:1010.4639 (2010)
Negrut, D., Tasora, A., Anitescu, M., Mazhar, H., Heyn, T., Pazouki, A.: Solving large multi-body dynamics problems on the GPU. In: GPU Gems, vol. 4, pp. 269–280 (2011). doi:10.1016/b978-0-12-385963-1.00020-4
Mazhar, H., Heyn, T., Negrut, D.: A scalable parallel method for large collision detection problems. Multibody Syst. Dyn. 26(1), 37–55 (2011). doi:10.1007/s11044-011-9246-y
Negrut, D., Tasora, A., Mazhar, H., Heyn, T., Hahn, P.: Leveraging parallel computing in multibody dynamics. Multibody Syst. Dyn. 27(1), 95–117 (2012). doi:10.1007/s11044-011-9262-y
Gaikwad, A., Toke, I.M.: Parallel iterative linear solvers on GPU: a financial engineering case. In: Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 607–614 (2010). doi:10.1109/pdp.2010.55
Scott, J.A., Hu, Y.: Experiences of sparse direct symmetric solvers. ACM Trans. Math. Softw. 33(3), 18 (2007). doi:10.1145/1268769.1268772
Davis, T.A.: Direct Methods for Sparse Linear Systems. Fundamentals of Algorithms, vol. 2. SIAM, Philadelphia (2006). doi:10.1137/1.9780898718881
Irons, B.M.: A frontal solution program for finite element analysis. Int. J. Numer. Methods Eng. 2(1), 5–32 (1970). doi:10.1002/nme.1620020104
Scott, J.A.: A parallel frontal solver for finite element applications. Int. J. Numer. Methods Eng. 50(5), 1131–1144 (2001). doi:10.1002/1097-0207(20010220)50:5<1131::aid-nme68>3.0.co;2-x
Reid, J.K., Scott, J.A.: An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems. Int. J. Numer. Methods Eng. 77(7), 901–921 (2009). doi:10.1002/nme.2437
Rennich, S.C., Stosic, D., Davis, T.A.: Accelerating sparse Cholesky factorization on GPUs. In: Proceedings of the Fourth Workshop on Irregular Applications: Architectures and Algorithms, pp. 9–16. IEEE Press, Piscataway (2014). doi:10.1109/IA3.2014.6
Yeralan, S.N., Davis, T.A., Ranka, S.: Algorithm 9xx: sparse QR factorization on the GPU. ACM Trans. Math. Softw. (2015)
Bae, D.S., Kim, H.W., Yoo, H.H., Suh, M.S.: A decoupling solution method for implicit numerical integration of constrained mechanical systems. Mech. Struct. Mach. 27(2), 129–141 (1999). doi:10.1080/08905459908915692
Horowitz, E.: Fundamentals of Data Structures in C++. Galgotia Publications, New Delhi (2006)
Brainman, I., Toledo, S.: Nested-dissection orderings for sparse LU with partial pivoting. SIAM J. Matrix Anal. Appl. 23(4), 998–1012 (2002). doi:10.1137/s0895479801385037
Davis, T.A., Hager, W.W.: Dynamic supernodes in sparse Cholesky update/downdate and triangular solves. ACM Trans. Math. Softw. 35(4), 27 (2009). doi:10.1145/1462173.1462176
Karypis, G., Kumar, V.: METIS—a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 5.1.0. University of Minnesota (2013)
L’Excellent, J.Y.: Multifrontal methods: parallelism, memory usage and numerical aspects. Ecole Normale Supérieure de Lyon-ENS LYON (2012)
Padua, D.: Encyclopedia of Parallel Computing. Springer, Berlin (2011). doi:10.1007/978-0-387-09766-4
Guermouche, A., L’Excellent, J.Y., Utard, G.: On the memory usage of a parallel multifrontal solver. In: Parallel and Distributed Processing Symposium, p. 8 (2003). doi:10.1109/ipdps.2003.1213187
Guermouche, A., L’Excellent, J.Y., Utard, G.: Analysis and improvments of the memory usage of a multifrontal solver (2003)
NVIDIA Kepler GK110 architecture: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
Jung, J.H., Bae, D.S.: Optimization of operating and assembling mass properties of solid elements on heterogeneous platforms using OpenCL framework. J. Mech. Sci. Technol. 29(7), 2631–2637 (2015). doi:10.1007/s12206-015-0508-0
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009). doi:10.1016/j.parco.2008.10.002
Wang, L., Wu, W., Xu, Z., **ao, J., Yang, Y.: BLASX: a high performance level-3 BLAS library for heterogeneous multi-GPU computing. In: Proceedings of the 2016 International Conference on Supercomputing, pp. 20:1–20:11. ACM, New York (2016). doi:10.1145/2925426.2926256
Kurzak, J., Nath, R., Du, P., Dongarra, J.: An implementation of the tile QR factorization for a GPU and multiple CPUs. In: International Workshop on Applied Parallel Computing, pp. 248–257. Springer, Berlin (2010). doi:10.1007/978-3-642-28145-7_25
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In: Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8 (2010). doi:10.1109/IPDPSW.2010.5470941
Anderson, E., Dongarra, J.J., Ostrouchov, S.: Lapack working note 41: installation guide for lapack. University of Tennessee, Computer Science Department (1992)
Intel, Intel Math Kernel Library Reference Manual 11.3, 1575 (2015)
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the Spring Joint Computer Conference, April 18–20, 1967, pp. 483–485. ACM, New York (1967). doi:10.1109/N-SSC.2007.4785615
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jung, J., Bae, D. Accelerating implicit integration in multi-body dynamics using GPU computing. Multibody Syst Dyn 42, 169–195 (2018). https://doi.org/10.1007/s11044-017-9588-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11044-017-9588-1