Abstract
Molecular-continuum simulations couple molecular dynamics (MD) and computational fluid dynamics (CFD) simulations in a domain decomposition sense to assess fluid flow, e.g., in process engineering applications, at the nanoscale. Running these simulations on extreme-scale supercomputers, an issue consists in single compute cores or nodes failing due to hardware- or software-sided errors. This imposes a challenge to robustness of numerical simulations and, as such, also to molecular-continuum systems. We introduce a fault tolerance method in our macro-micro-coupling tool (MaMiCo) that has been developed in the past as molecular-continuum simulation software solution. With MaMiCo leveraging ensemble simulations to cope with statistical errors in the MD solutions, we extended the ensemble approach to recognize failing MPI processes and react to these failures. Once a failure is encountered, the affected MD simulations are removed from these MPI processes and relaunched on well-operating MPI process groups. We detail our approach and report scalability results for our approach, achieved on the supercomputer HAWK at HLRS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acun, B., Hardy, D.J., Kale, L.V., Li, K., Phillips, J.C., Stone, J.E.: Scalable molecular dynamics with NAMD on the Summit system. IBM J. Res. Dev. 62(6), 4:1–4:9 (2018)
Gupta, S., Patel, T., Engelmann, C., Tiwari, D.: Failures in large scale systems: long-term measurement, analysis, and implications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’17, New York, NY, USA, 2017. Association for Computing Machinery
Jafari, V., Wittmer, N., Neumann, P.: Massively Parallel Molecular-Continuum Flow Simulation with Error Control and Dynamic Ensemble Handling. HPCAsia2022, pp. 52–60. Association for Computing Machinery, New York, NY, USA (2022)
Jarmatz, P., Wittenberg, H., Jafari, V., Das Sharma, A., Maurer, F., Wittmer, N., Neumann, P.: MaMiCo 2.0: An enhanced open-source framework for high-performance molecular-continuum flow simulation (2022). Submitted
Laguna, I., Richards, D.F., Gamblin, T., Schulz, M., de Supinski, B.R.: Evaluating User-Level Fault Tolerance for MPI Applications. In: Proceedings of the 21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, page 57-62. Association for Computing Machinery, New York, NY, USA (2014)
Mohamed, K.M., Mohamad, A.A.: A review of the development of hybrid atomistic-continuum methods for dense fluids. Microfluids Nanofluidics 8, 283–302 (2010)
Neumann, P., Bian, X.: MaMiCo: transient Multi-Instance Molecular-Continuum Flow Simulation on Supercomputers. Comput. Phys. Commun. 220, 390–402 (2017)
Neumann, P., Flohr, H., Arora, R., Jarmatz, P., Tchipev, N., Bungartz, H.-J.: MaMiCo: software design for parallel molecular-continuum flow simulations. Comput. Phys. Commun. 200, 324–335 (2016)
Niethammer, C., Becker, S., Bernreuther, M., Buchholz, M., Eckhardt, W., Heinecke, A., Werth, S., Bungartz, H.-J., Glass, C.W., Hasse, H., Vrabec, J., Horsch, M.: ls1 mardyn: the massively parallel molecular dynamics code for large systems. J. Chem. Theory Comput. 10(10), 4455–4464 (2014)
Ossyra, J., Sedova, A., Tharrington, A., Noé, F., Clementi, C., Smith, J.C.: porting adaptive ensemble molecular dynamics workflows to the summit supercomputer. In: High Performance Computing, pp. 397–417. Springer International Publishing, Cham (2019)
Thompson, A.P., Aktulga, H.M., Berger, R., Bolintineanu, D.S., Brown, W.M., Crozier, P.S., in ’t Veld, P.J.,  Kohlmeyer, A., Moore, S.G., Nguyen, T.D., Shan, R., Stevens, M.J., Tranchida, J., Trott, C., Plimpton, S.J.: LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comput. Phys. Commun. 271, 10817 (2022)
Wittenberg, H., Neumann, P.: Transient two-way molecular-continuum coupling with OpenFOAM and MaMiCo: a sensitivity study. Computation 9(12), 128 (2021)
Yajnik, S., Jha, N.K.: Synthesis of fault tolerant architectures for molecular dynamics. In: Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS ’94, vol. 4, pp. 247–250 (1994)
Acknowledgements
We thank HLRS and the Gauss Centre for Supercomputing for the provision of computational resources (project GCS-MDDC). We further acknowledge funding for MaMiCo software developments by the project Macro/Micro-Simulation of Phase Decomposition in the Transcritical Regime (MaST) of the Digitalization and Technology Research Center of the Bundeswehr (dtec.bw) and the HSU-internal research funding (project Resilience and Dynamic Noise Reduction at Exascale for Multiscale Simulation Coupling).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jafari, V. et al. (2024). Fault Tolerant Molecular-Continuum Flow Simulation. In: Nagel, W.E., Kröner, D.H., Resch, M.M. (eds) High Performance Computing in Science and Engineering '22. HPCSE 2022. Springer, Cham. https://doi.org/10.1007/978-3-031-46870-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-46870-4_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46869-8
Online ISBN: 978-3-031-46870-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)