A CUDA-based parallel optimization method for SM3 hash algorithm

Han, Jichang; Peng, Tao; Zhang, Xuesong

doi:10.1007/s11227-024-06141-6

A CUDA-based parallel optimization method for SM3 hash algorithm

Published: 10 June 2024

(2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jichang Han¹,
Tao Peng¹ &
Xuesong Zhang¹

31 Accesses
Explore all metrics

Abstract

Hash algorithms are among the most crucial algorithms in cryptography. The SM3 algorithm is a hash cryptographic standard of China. Because of the strong collision resistance and irreversibility of hash algorithms, they are widely used as a basic function in various fields such as digital signatures and random number generation. With the increasing real-time applications of automation in the fields of finance and office, the network puts forward higher demands for the implementing efficiency of the SM3 algorithm. We present a CUDA-based parallel optimized method for SM3 algorithm by four different ways: They are Single data stream with Single thread (SS), Multiple data streams with Single thread (MS), Single data stream with Multi-thread (SM), and Multiple data streams with Multi-thread (MM). The experimental result shows MM is the best of the four. When considering the data transmission between CPU and GPU, the proposed optimized algorithm achieves a peak performance of 166.42 Gb/s, which is 1.96 times of the best-known implementation of the SM3 algorithm on GPU platforms. Without transmission time counting, the peak performance is near 8500 Gb/s. Compared with other SM3 GPU algorithms, the algorithm proposed in this paper significantly enhances the efficiency of digest generation. Furthermore, the results show a new conclusion that the optimization of logical operations in the SM3 algorithm has reached a very high extent and the data transmission of PCIE becomes the bottleneck in the CPU+GPU data processing mode. Therefore, future work on the optimization of the SM3 algorithm should pay more attention to the PCIE data transfer efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Implementation and Optimization of SM4 Based on CUDA

A Novel Hash Function Based on Multi-iterative Parallel Structure

Article 01 July 2022

Design and Optimizations of the MD5 Crypt Cracking Algorithm Based on CUDA

References

Zheng X, Xu C, Hu X, Zhang Y, **ong X (2020) The software/hardware co-design and implementation of SM2/3/4 encryption/decryption and digital signature system. IEEE Trans Comput Aided Des Integr Circuits Syst 39(10):2055–2066. https://doi.org/10.1109/TCAD.2019.2939330
Article Google Scholar
Zhou M, Ruan S, Liu J, Chen X, Yang M, Wang Q (2022) vtpm-sm: An application scheme of SM2/SM3/SM4 algorithms based on trusted computing in cloud environment. In: IEEE 15th International Conference on Cloud Computing, CLOUD 2022, Barcelona, Spain, July 10-16, 2022, pp. 351–356. https://doi.org/10.1109/CLOUD55607.2022.00058
Yang Y, Han S, **e P, Zhu Y, Ding Z, Hou S, Xu S, Zheng H (2022) Implementation and optimization of zero-knowledge proof circuit based on hash function SM3. Sensors 22(16):5951. https://doi.org/10.3390/S22165951
Article Google Scholar
Xu Y, Han L, Yu Z, Che F (2022) Optimized design implementation and research of sm3 hash algorithm based on fpga. In: 2022 2nd International Conference on Computer Science and Blockchain (CCSB), pp. 111–117. https://doi.org/10.1109/CCSB58128.2022.00027
Huang X, Guo Z, Song M, Zeng X (2021) Accelerating the SM3 hash algorithm with CPU-FPGA co-designed architecture. IET Comput Digit Tech 15(6):427–436. https://doi.org/10.1049/CDT2.12034
Article Google Scholar
Zang S, Zhao D, Hu Y, Hu X, Gao Y, Du P, Cheng S (2021) A high speed sm3 algorithm implementation for security chip. In: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 915–919. https://doi.org/10.1109/IAEAC50856.2021.9390790
Zou J, Li L, Wei Z, Luo Y, Liu Q, Wu W (2022) New quantum circuit implementations of SM4 and SM3. Quantum Inf Process 21(5):181. https://doi.org/10.1007/S11128-022-03518-5
Article MathSciNet Google Scholar
Tian H, Li Y, Wang Y, Peng T, Shi S, Qiu W (2019) Optimized password recovery based on gpus for SM3 algorithm. In: Proceedings of the 3rd International Conference on Computer Science and Application Engineering, CSAE 2019, Sanya, China, October 22-24, 2019, pp 148–11485. https://doi.org/10.1145/3331453.3361632
Sun S, Zhang R, Ma H (2021) Hashing multiple messages with SM3 on GPU platforms. China Inf. Sci, Sci. https://doi.org/10.1007/S11432-018-9648-X
Book Google Scholar
Song G, Jang K, Kim H, Lee W, Hu Z, Seo H (2021) Grover on SM3. In: Information Security and Cryptology - ICISC 2021 - 24th International Conference, Seoul, South Korea, December 1-3, 2021, Revised Selected Papers. Lecture Notes in Computer Science, vol. 13218, pp 421–433. https://doi.org/10.1007/978-3-031-08896-4_22
NVIDIA (2023) CUDA C++ Programming Guide V12.0. Website. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Zhang S, Meng H, Li X, Liu W, Liu B (2021) Hunion traceability: A new type of blockchain traceability system based on sm2, SM3 and SM4. In: ICBTA 2021: 4th International Conference on Blockchain Technology and Applications, **’an, China, December 17 - 19, 2021, pp 107–115. https://doi.org/10.1145/3510487.3510503
Choi H, Seo SC (2021) Fast implementation of SHA-3 in GPU environment. IEEE Access 9:144574–144586. https://doi.org/10.1109/ACCESS.2021.3122466
Article Google Scholar
Yuan Y, Qu K, Wu L, Ma J, Zhang X (2019) Correlation power attack on a message authentication code based on SM3. Frontiers Inf Technol Electron Eng 20(7):930–945. https://doi.org/10.1631/FITEE.1800312
Article Google Scholar
Davendra D, Metlicka M, Bialic-Davendra M (2023) CUDA implementation of the antlion optimization algorithm. Int J Parallel Emerg Distrib Syst 38(2):118–139. https://doi.org/10.1080/17445760.2023.2172576
Article Google Scholar
Chen G (2009) Study on parallel computing. In: Deng, X., Hopcroft, J.E., Xue, J. (eds.) Frontiers in Algorithmics, Third International Workshop, FAW 2009, Hefei, China, June 20-23, 2009. Proceedings. Lecture Notes in Computer Science, vol. 5598, p 1. https://doi.org/10.1007/978-3-642-02270-8_1
Yin F, Shi F (2022) A comparative survey of big data computing and HPC: from a parallel programming model to a cluster architecture. Int J Parallel Progr 50(1):27–64. https://doi.org/10.1007/S10766-021-00717-Y
Article Google Scholar
Liu Y, Zhao R, Han L, **e J (2022) Research and implementation of parallel optimization of sm3 algorithm based on multithread. In: 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), pp 330–336. https://doi.org/10.1109/ICSP54964.2022.9778455
Zhang K, Zhang H, Cheng Q, Chen X, Wang Z, Liu Z (2023) A customized two-stage parallel computing algorithm for solving the combined modal split and traffic assignment problem. Comput Oper Res 154:106193. https://doi.org/10.1016/J.COR.2023.106193
Article MathSciNet Google Scholar
Dong J, Lu S, Zhang P, Zheng F, **ao F (2022) G-SM3: high-performance implementation of gpu-based SM3 hash function. In: 28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022, Nan**g, China, January 10-12, 2023, pp 201–208. https://doi.org/10.1109/ICPADS56603.2022.00034
Cicirelli F, Giordano A, Mastroianni C (2021) Analysis of global and local synchronization in parallel computing. IEEE Trans Parallel Distrib Syst 32(5):988–1000. https://doi.org/10.1109/TPDS.2020.3037469
Article Google Scholar
Li F, Zou F, Rao J (2023) A multi-gpu and cuda-aware mpi-based spectral element formulation for ultrasonic wave propagation in solid media. Ultrasonics 134:107049. https://doi.org/10.1016/j.ultras.2023.107049
Article Google Scholar
Pang W, Luo X, Chen K, Ji D, Qiao L, Yi W (2023) Efficient CUDA stream management for multi-dnn real-time inference on embedded gpus. J Syst Archit 139:102888. https://doi.org/10.1016/J.SYSARC.2023.102888
Article Google Scholar
**ao C, Zhao G, Zhang L, Ding D (2023) A controllable pipeline framework of block ciphers on GPU for streaming data. IEEE Access 11:93980–93993. https://doi.org/10.1109/ACCESS.2023.3310401
Article Google Scholar
Hrbek V, Brandejský T (2023) Memetic algorithm with gpu optimization. Data Sci Algorithms Syst. https://doi.org/10.1007/978-3-031-21438-7_15
Article Google Scholar
Jeshani T (2023) Dynamically finding optimal kernel launch parameters for cuda programs. https://api.semanticscholar.org/CorpusID:260061584
Guo H, Yue Y, Bo M, Liu Y, Fu Y, Shang J (2022) Transplantation and optimization of gpu-oriented sm3 cryptographic hash algorithm. In: Other Conferences.https://doi.org/10.1117/12.2640754

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China
Jichang Han, Tao Peng & Xuesong Zhang

Authors

Jichang Han
View author publications
You can also search for this author in PubMed Google Scholar
Tao Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.: Methodology, Writing - Original Draft, Review& Editing. P.: Formal analysis, Validation. Z.: Software and Investigation. H.P. and Z. all authors contributed to the final version of the manuscript. P. Z. supervised the project.

Corresponding author

Correspondence to Xuesong Zhang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, J., Peng, T. & Zhang, X. A CUDA-based parallel optimization method for SM3 hash algorithm. J Supercomput (2024). https://doi.org/10.1007/s11227-024-06141-6

Download citation

Accepted: 10 April 2024
Published: 10 June 2024
DOI: https://doi.org/10.1007/s11227-024-06141-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A CUDA-based parallel optimization method for SM3 hash algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Implementation and Optimization of SM4 Based on CUDA

A Novel Hash Function Based on Multi-iterative Parallel Structure

Design and Optimizations of the MD5 Crypt Cracking Algorithm Based on CUDA

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A CUDA-based parallel optimization method for SM3 hash algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Implementation and Optimization of SM4 Based on CUDA

A Novel Hash Function Based on Multi-iterative Parallel Structure

Design and Optimizations of the MD5 Crypt Cracking Algorithm Based on CUDA

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation