Log in

Accelerating implicit integration in multi-body dynamics using GPU computing

  • Published:
Multibody System Dynamics Aims and scope Submit manuscript

Abstract

A new direct linear equation solver is proposed for GPUs. The proposed solver is applied to mechanical system analysis. In contrast to the DFS post-order traversal which is widely used for conventional implementation of supernodal and multifrontal methods, the BFS reverse-level order traversal has been adopted to obtain more parallelism and a more adaptive control of data size. The proposed implementation allows solving large problems efficiently on many kinds of GPUs. Separators are divided into smaller blocks to further improve the parallel efficiency. Numerical experiments show that the proposed method takes smaller factorization time than CHOLMOD in general and has better operational availability than SPQR. Mechanical dynamic analysis has been carried out to show the efficiency of the proposed method. The computing time, memory usage, and solution accuracy are compared with those obtained from DSS included in MKL. The GPU has been accelerated about 2.5–5.9 times during the numerical factorization step and approximately 1.9–4.7 times over the whole analysis process, compared to an experimental CPU device.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. TOP 500: http://www.top500.org/

  2. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008). doi:10.1109/jproc.2008.917757

    Article  Google Scholar 

  3. Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003). doi:10.1137/1.9780898718003

    Book  MATH  Google Scholar 

  4. Lukash, M., Rupp, K., Selberherr, S.: Sparse approximate inverse preconditioners for iterative solvers on GPUs. In: Proceedings of the 2012 Symposium on High Performance Computing (2012)

    Google Scholar 

  5. Serban, R., Melanz, D., Li, A., Stanciulescu, I., Jayakumar, P., Negrut, D.: GPU-based preconditioned Newton–Krylov solver for flexible multibody dynamics. Int. J. Numer. Methods Eng. 102(9), 1585–1604 (2015). doi:10.1002/nme.4876

    Article  MathSciNet  MATH  Google Scholar 

  6. Naumov, M.: Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS. Nvidia white paper (2011)

  7. Wong, J., Kuhl, E., Darve, E.: A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. Int. J. Numer. Methods Eng. 102(12), 1784–1814 (2015). doi:10.1002/nme.4865

    Article  MathSciNet  MATH  Google Scholar 

  8. Rodrigues, A.W.D.O., Guyomarch, F., Menach, Y.L., Dekeyser, J.L.: Parallel sparse matrix solver on the GPU applied to simulation of electrical machines. ar**v:1010.4639 (2010)

  9. Negrut, D., Tasora, A., Anitescu, M., Mazhar, H., Heyn, T., Pazouki, A.: Solving large multi-body dynamics problems on the GPU. In: GPU Gems, vol. 4, pp. 269–280 (2011). doi:10.1016/b978-0-12-385963-1.00020-4

    Google Scholar 

  10. Mazhar, H., Heyn, T., Negrut, D.: A scalable parallel method for large collision detection problems. Multibody Syst. Dyn. 26(1), 37–55 (2011). doi:10.1007/s11044-011-9246-y

    Article  MATH  Google Scholar 

  11. Negrut, D., Tasora, A., Mazhar, H., Heyn, T., Hahn, P.: Leveraging parallel computing in multibody dynamics. Multibody Syst. Dyn. 27(1), 95–117 (2012). doi:10.1007/s11044-011-9262-y

    Article  MATH  Google Scholar 

  12. Gaikwad, A., Toke, I.M.: Parallel iterative linear solvers on GPU: a financial engineering case. In: Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 607–614 (2010). doi:10.1109/pdp.2010.55

    Google Scholar 

  13. Scott, J.A., Hu, Y.: Experiences of sparse direct symmetric solvers. ACM Trans. Math. Softw. 33(3), 18 (2007). doi:10.1145/1268769.1268772

    Article  MathSciNet  MATH  Google Scholar 

  14. Davis, T.A.: Direct Methods for Sparse Linear Systems. Fundamentals of Algorithms, vol. 2. SIAM, Philadelphia (2006). doi:10.1137/1.9780898718881

    Book  MATH  Google Scholar 

  15. Irons, B.M.: A frontal solution program for finite element analysis. Int. J. Numer. Methods Eng. 2(1), 5–32 (1970). doi:10.1002/nme.1620020104

    Article  MathSciNet  MATH  Google Scholar 

  16. Scott, J.A.: A parallel frontal solver for finite element applications. Int. J. Numer. Methods Eng. 50(5), 1131–1144 (2001). doi:10.1002/1097-0207(20010220)50:5<1131::aid-nme68>3.0.co;2-x

    Article  MATH  Google Scholar 

  17. Reid, J.K., Scott, J.A.: An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems. Int. J. Numer. Methods Eng. 77(7), 901–921 (2009). doi:10.1002/nme.2437

    Article  MathSciNet  MATH  Google Scholar 

  18. Rennich, S.C., Stosic, D., Davis, T.A.: Accelerating sparse Cholesky factorization on GPUs. In: Proceedings of the Fourth Workshop on Irregular Applications: Architectures and Algorithms, pp. 9–16. IEEE Press, Piscataway (2014). doi:10.1109/IA3.2014.6

    Google Scholar 

  19. Yeralan, S.N., Davis, T.A., Ranka, S.: Algorithm 9xx: sparse QR factorization on the GPU. ACM Trans. Math. Softw. (2015)

  20. Bae, D.S., Kim, H.W., Yoo, H.H., Suh, M.S.: A decoupling solution method for implicit numerical integration of constrained mechanical systems. Mech. Struct. Mach. 27(2), 129–141 (1999). doi:10.1080/08905459908915692

    Article  MathSciNet  Google Scholar 

  21. Horowitz, E.: Fundamentals of Data Structures in C++. Galgotia Publications, New Delhi (2006)

    Google Scholar 

  22. Brainman, I., Toledo, S.: Nested-dissection orderings for sparse LU with partial pivoting. SIAM J. Matrix Anal. Appl. 23(4), 998–1012 (2002). doi:10.1137/s0895479801385037

    Article  MathSciNet  MATH  Google Scholar 

  23. Davis, T.A., Hager, W.W.: Dynamic supernodes in sparse Cholesky update/downdate and triangular solves. ACM Trans. Math. Softw. 35(4), 27 (2009). doi:10.1145/1462173.1462176

    Article  MathSciNet  Google Scholar 

  24. Karypis, G., Kumar, V.: METIS—a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 5.1.0. University of Minnesota (2013)

  25. L’Excellent, J.Y.: Multifrontal methods: parallelism, memory usage and numerical aspects. Ecole Normale Supérieure de Lyon-ENS LYON (2012)

  26. Padua, D.: Encyclopedia of Parallel Computing. Springer, Berlin (2011). doi:10.1007/978-0-387-09766-4

    Book  MATH  Google Scholar 

  27. Guermouche, A., L’Excellent, J.Y., Utard, G.: On the memory usage of a parallel multifrontal solver. In: Parallel and Distributed Processing Symposium, p. 8 (2003). doi:10.1109/ipdps.2003.1213187

    Chapter  Google Scholar 

  28. Guermouche, A., L’Excellent, J.Y., Utard, G.: Analysis and improvments of the memory usage of a multifrontal solver (2003)

  29. NVIDIA Kepler GK110 architecture: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

  30. Jung, J.H., Bae, D.S.: Optimization of operating and assembling mass properties of solid elements on heterogeneous platforms using OpenCL framework. J. Mech. Sci. Technol. 29(7), 2631–2637 (2015). doi:10.1007/s12206-015-0508-0

    Article  Google Scholar 

  31. Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009). doi:10.1016/j.parco.2008.10.002

    Article  MathSciNet  Google Scholar 

  32. Wang, L., Wu, W., Xu, Z., **ao, J., Yang, Y.: BLASX: a high performance level-3 BLAS library for heterogeneous multi-GPU computing. In: Proceedings of the 2016 International Conference on Supercomputing, pp. 20:1–20:11. ACM, New York (2016). doi:10.1145/2925426.2926256

    Google Scholar 

  33. Kurzak, J., Nath, R., Du, P., Dongarra, J.: An implementation of the tile QR factorization for a GPU and multiple CPUs. In: International Workshop on Applied Parallel Computing, pp. 248–257. Springer, Berlin (2010). doi:10.1007/978-3-642-28145-7_25

    Google Scholar 

  34. Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In: Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8 (2010). doi:10.1109/IPDPSW.2010.5470941

    Google Scholar 

  35. Anderson, E., Dongarra, J.J., Ostrouchov, S.: Lapack working note 41: installation guide for lapack. University of Tennessee, Computer Science Department (1992)

  36. Intel, Intel Math Kernel Library Reference Manual 11.3, 1575 (2015)

  37. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the Spring Joint Computer Conference, April 18–20, 1967, pp. 483–485. ACM, New York (1967). doi:10.1109/N-SSC.2007.4785615

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daesung Bae.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jung, J., Bae, D. Accelerating implicit integration in multi-body dynamics using GPU computing. Multibody Syst Dyn 42, 169–195 (2018). https://doi.org/10.1007/s11044-017-9588-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11044-017-9588-1

Keywords

Navigation