Accelerating implicit integration in multi-body dynamics using GPU computing

Jung, Jihyun; Bae, Daesung

doi:10.1007/s11044-017-9588-1

Accelerating implicit integration in multi-body dynamics using GPU computing

Published: 07 August 2017

Volume 42, pages 169–195, (2018)
Cite this article

Multibody System Dynamics Aims and scope Submit manuscript

Jihyun Jung¹ &
Daesung Bae¹

2218 Accesses
4 Citations
Explore all metrics

Abstract

A new direct linear equation solver is proposed for GPUs. The proposed solver is applied to mechanical system analysis. In contrast to the DFS post-order traversal which is widely used for conventional implementation of supernodal and multifrontal methods, the BFS reverse-level order traversal has been adopted to obtain more parallelism and a more adaptive control of data size. The proposed implementation allows solving large problems efficiently on many kinds of GPUs. Separators are divided into smaller blocks to further improve the parallel efficiency. Numerical experiments show that the proposed method takes smaller factorization time than CHOLMOD in general and has better operational availability than SPQR. Mechanical dynamic analysis has been carried out to show the efficiency of the proposed method. The computing time, memory usage, and solution accuracy are compared with those obtained from DSS included in MKL. The GPU has been accelerated about 2.5–5.9 times during the numerical factorization step and approximately 1.9–4.7 times over the whole analysis process, compared to an experimental CPU device.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

TOP 500: http://www.top500.org/
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008). doi:10.1109/jproc.2008.917757
Article Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003). doi:10.1137/1.9780898718003
Book MATH Google Scholar
Lukash, M., Rupp, K., Selberherr, S.: Sparse approximate inverse preconditioners for iterative solvers on GPUs. In: Proceedings of the 2012 Symposium on High Performance Computing (2012)
Google Scholar
Serban, R., Melanz, D., Li, A., Stanciulescu, I., Jayakumar, P., Negrut, D.: GPU-based preconditioned Newton–Krylov solver for flexible multibody dynamics. Int. J. Numer. Methods Eng. 102(9), 1585–1604 (2015). doi:10.1002/nme.4876
Article MathSciNet MATH Google Scholar
Naumov, M.: Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS. Nvidia white paper (2011)
Wong, J., Kuhl, E., Darve, E.: A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. Int. J. Numer. Methods Eng. 102(12), 1784–1814 (2015). doi:10.1002/nme.4865
Article MathSciNet MATH Google Scholar
Rodrigues, A.W.D.O., Guyomarch, F., Menach, Y.L., Dekeyser, J.L.: Parallel sparse matrix solver on the GPU applied to simulation of electrical machines. ar**v:1010.4639 (2010)
Negrut, D., Tasora, A., Anitescu, M., Mazhar, H., Heyn, T., Pazouki, A.: Solving large multi-body dynamics problems on the GPU. In: GPU Gems, vol. 4, pp. 269–280 (2011). doi:10.1016/b978-0-12-385963-1.00020-4
Google Scholar
Mazhar, H., Heyn, T., Negrut, D.: A scalable parallel method for large collision detection problems. Multibody Syst. Dyn. 26(1), 37–55 (2011). doi:10.1007/s11044-011-9246-y
Article MATH Google Scholar
Negrut, D., Tasora, A., Mazhar, H., Heyn, T., Hahn, P.: Leveraging parallel computing in multibody dynamics. Multibody Syst. Dyn. 27(1), 95–117 (2012). doi:10.1007/s11044-011-9262-y
Article MATH Google Scholar
Gaikwad, A., Toke, I.M.: Parallel iterative linear solvers on GPU: a financial engineering case. In: Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 607–614 (2010). doi:10.1109/pdp.2010.55
Google Scholar
Scott, J.A., Hu, Y.: Experiences of sparse direct symmetric solvers. ACM Trans. Math. Softw. 33(3), 18 (2007). doi:10.1145/1268769.1268772
Article MathSciNet MATH Google Scholar
Davis, T.A.: Direct Methods for Sparse Linear Systems. Fundamentals of Algorithms, vol. 2. SIAM, Philadelphia (2006). doi:10.1137/1.9780898718881
Book MATH Google Scholar
Irons, B.M.: A frontal solution program for finite element analysis. Int. J. Numer. Methods Eng. 2(1), 5–32 (1970). doi:10.1002/nme.1620020104
Article MathSciNet MATH Google Scholar
Scott, J.A.: A parallel frontal solver for finite element applications. Int. J. Numer. Methods Eng. 50(5), 1131–1144 (2001). doi:10.1002/1097-0207(20010220)50:5<1131::aid-nme68>3.0.co;2-x
Article MATH Google Scholar
Reid, J.K., Scott, J.A.: An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems. Int. J. Numer. Methods Eng. 77(7), 901–921 (2009). doi:10.1002/nme.2437
Article MathSciNet MATH Google Scholar
Rennich, S.C., Stosic, D., Davis, T.A.: Accelerating sparse Cholesky factorization on GPUs. In: Proceedings of the Fourth Workshop on Irregular Applications: Architectures and Algorithms, pp. 9–16. IEEE Press, Piscataway (2014). doi:10.1109/IA3.2014.6
Google Scholar
Yeralan, S.N., Davis, T.A., Ranka, S.: Algorithm 9xx: sparse QR factorization on the GPU. ACM Trans. Math. Softw. (2015)
Bae, D.S., Kim, H.W., Yoo, H.H., Suh, M.S.: A decoupling solution method for implicit numerical integration of constrained mechanical systems. Mech. Struct. Mach. 27(2), 129–141 (1999). doi:10.1080/08905459908915692
Article MathSciNet Google Scholar
Horowitz, E.: Fundamentals of Data Structures in C++. Galgotia Publications, New Delhi (2006)
Google Scholar
Brainman, I., Toledo, S.: Nested-dissection orderings for sparse LU with partial pivoting. SIAM J. Matrix Anal. Appl. 23(4), 998–1012 (2002). doi:10.1137/s0895479801385037
Article MathSciNet MATH Google Scholar
Davis, T.A., Hager, W.W.: Dynamic supernodes in sparse Cholesky update/downdate and triangular solves. ACM Trans. Math. Softw. 35(4), 27 (2009). doi:10.1145/1462173.1462176
Article MathSciNet Google Scholar
Karypis, G., Kumar, V.: METIS—a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 5.1.0. University of Minnesota (2013)
L’Excellent, J.Y.: Multifrontal methods: parallelism, memory usage and numerical aspects. Ecole Normale Supérieure de Lyon-ENS LYON (2012)
Padua, D.: Encyclopedia of Parallel Computing. Springer, Berlin (2011). doi:10.1007/978-0-387-09766-4
Book MATH Google Scholar
Guermouche, A., L’Excellent, J.Y., Utard, G.: On the memory usage of a parallel multifrontal solver. In: Parallel and Distributed Processing Symposium, p. 8 (2003). doi:10.1109/ipdps.2003.1213187
Chapter Google Scholar
Guermouche, A., L’Excellent, J.Y., Utard, G.: Analysis and improvments of the memory usage of a multifrontal solver (2003)
NVIDIA Kepler GK110 architecture: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
Jung, J.H., Bae, D.S.: Optimization of operating and assembling mass properties of solid elements on heterogeneous platforms using OpenCL framework. J. Mech. Sci. Technol. 29(7), 2631–2637 (2015). doi:10.1007/s12206-015-0508-0
Article Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009). doi:10.1016/j.parco.2008.10.002
Article MathSciNet Google Scholar
Wang, L., Wu, W., Xu, Z., **ao, J., Yang, Y.: BLASX: a high performance level-3 BLAS library for heterogeneous multi-GPU computing. In: Proceedings of the 2016 International Conference on Supercomputing, pp. 20:1–20:11. ACM, New York (2016). doi:10.1145/2925426.2926256
Google Scholar
Kurzak, J., Nath, R., Du, P., Dongarra, J.: An implementation of the tile QR factorization for a GPU and multiple CPUs. In: International Workshop on Applied Parallel Computing, pp. 248–257. Springer, Berlin (2010). doi:10.1007/978-3-642-28145-7_25
Google Scholar
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In: Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8 (2010). doi:10.1109/IPDPSW.2010.5470941
Google Scholar
Anderson, E., Dongarra, J.J., Ostrouchov, S.: Lapack working note 41: installation guide for lapack. University of Tennessee, Computer Science Department (1992)
Intel, Intel Math Kernel Library Reference Manual 11.3, 1575 (2015)
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the Spring Joint Computer Conference, April 18–20, 1967, pp. 483–485. ACM, New York (1967). doi:10.1109/N-SSC.2007.4785615
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical Engineering, Hanyang University, Ansan, Gyeonggi-do, 15588, Korea
Jihyun Jung & Daesung Bae

Authors

Jihyun Jung
View author publications
You can also search for this author in PubMed Google Scholar
Daesung Bae
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daesung Bae.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jung, J., Bae, D. Accelerating implicit integration in multi-body dynamics using GPU computing. Multibody Syst Dyn 42, 169–195 (2018). https://doi.org/10.1007/s11044-017-9588-1

Download citation

Received: 15 November 2016
Accepted: 14 July 2017
Published: 07 August 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11044-017-9588-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating implicit integration in multi-body dynamics using GPU computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-Based Parallel Integration of Large Numbers of Independent ODE Systems

The Introduction of Multi-level Parallelism Solvers in Multibody Dynamics

An implementation of direct linear equation solver using a many-core CPU for mechanical dynamic analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Accelerating implicit integration in multi-body dynamics using GPU computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-Based Parallel Integration of Large Numbers of Independent ODE Systems

The Introduction of Multi-level Parallelism Solvers in Multibody Dynamics

An implementation of direct linear equation solver using a many-core CPU for mechanical dynamic analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation