Abstract
On the path toward develo** the first fusion energy devices, plasma simulations have become indispensable tools for supporting the design and development of fusion machines. Among these critical simulation tools, BIT1 is an advanced Particle-in-Cell code with Monte Carlo collisions, specifically designed for modeling plasma-material interaction and, in particular, analyzing the power load distribution on tokamak divertors. The current implementation of BIT1 relies exclusively on MPI for parallel communication and lacks support for GPUs. In this work, we address these limitations by designing and implementing a hybrid, shared-memory version of BIT1 capable of utilizing GPUs. For shared-memory parallelization, we rely on OpenMP and OpenACC, using a task-based approach to mitigate load-imbalance issues in the particle mover. On an HPE Cray EX computing node, we observe an initial performance improvement of approximately 42%, with scalable performance showing an enhancement of about 38% when using 8 MPI ranks. Still relying on OpenMP and OpenACC, we introduce the first version of BIT1 capable of using GPUs. We investigate two different data movement strategies: unified memory and explicit data movement. Overall, we report BIT1 data transfer findings during each PIC cycle. Among BIT1 GPU implementations, we demonstrate performance improvement through concurrent GPU utilization, especially when MPI ranks are assigned to dedicated GPUs. Finally, we analyze the performance of the first BIT1 GPU porting with the NVIDIA Nsight tools to further our understanding of BIT1’s computational efficiency for large-scale plasma simulations, capable of exploiting current supercomputer infrastructures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ayguadé, E., et al.: The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2008)
Chandrasekaran, S., et al.: OpenACC for Programmers: Concepts and Strategies. Addison-Wesley Professional (2017)
Chien, S.W., et al.: sputniPIC: an implicit particle-in-cell code for multi-GPU systems. In: 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 149–156. IEEE (2020)
Derouillat, J., et al.: SMILEI: a collaborative, open-source, multi-purpose particle-in-cell code for plasma simulation. Comput. Phys. Commun. 222, 351–373 (2018)
Liu, F., et al.: Parallel Cholesky factorization for banded matrices using OpenMP tasks. In: Euro-Par 2023: Parallel Processing. LNCS, vol. 14100, pp. 725–739. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-39698-4_49
Markidis, S., et al.: The EPiGRAM project: preparing parallel programming models for Exascale. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 56–68. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46079-6_5
Peng, I.B., et al.: Acceleration of a Particle-in-Cell code for space plasma simulations with OpenACC. In: EGU General Assembly Conference Abstracts, p. 1276 (2015)
Tskhakaya, D., et al.: Optimization of pic codes by improved memory management. J. Comput. Phys. 225(1), 829–839 (2007)
Tskhakaya, D., et al.: PIC/MC code bit1 for plasma simulations on HPC. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 476–481. IEEE (2010)
Vay, J.L., et al.: Warp-X: a new Exascale computing platform for beam-plasma simulations. Nucl. Instrum. Methods Phys. Res., Sect. A 909, 476–479 (2018)
Verboncoeur, J.P., et al.: Simultaneous potential and circuit solution for 1D bounded plasma particle simulation codes. J. Comput. Phys. 104(2), 321–328 (1993)
Wei, Y., et al.: Performance and portability studies with OpenACC accelerated version of GTC-P. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 13–18. IEEE (2016)
Williams, J.J., et al.: Leveraging HPC profiling & tracing tools to understand the performance of particle-in-cell monte Carlo simulations. Euro-Par 2023: Parallel Processing Workshops, ar**v preprint ar**v:2306.16512 (2023)
Acknowledgments
Funded by the European Union. This work has received funding from the European High Performance Computing Joint Undertaking (JU) and Sweden, Finland, Germany, Greece, France, Slovenia, Spain, and Czech Republic under grant agreement No 101093261.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Williams, J.J., Liu, F., Tskhakaya, D., Costea, S., Podolnik, A., Markidis, S. (2024). Optimizing BIT1, a Particle-in-Cell Monte Carlo Code, with OpenMP/OpenACC and GPU Acceleration. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. ICCS 2024. Lecture Notes in Computer Science, vol 14832. Springer, Cham. https://doi.org/10.1007/978-3-031-63749-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-63749-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63748-3
Online ISBN: 978-3-031-63749-0
eBook Packages: Computer ScienceComputer Science (R0)