Abstract
Hash algorithms are among the most crucial algorithms in cryptography. The SM3 algorithm is a hash cryptographic standard of China. Because of the strong collision resistance and irreversibility of hash algorithms, they are widely used as a basic function in various fields such as digital signatures and random number generation. With the increasing real-time applications of automation in the fields of finance and office, the network puts forward higher demands for the implementing efficiency of the SM3 algorithm. We present a CUDA-based parallel optimized method for SM3 algorithm by four different ways: They are Single data stream with Single thread (SS), Multiple data streams with Single thread (MS), Single data stream with Multi-thread (SM), and Multiple data streams with Multi-thread (MM). The experimental result shows MM is the best of the four. When considering the data transmission between CPU and GPU, the proposed optimized algorithm achieves a peak performance of 166.42 Gb/s, which is 1.96 times of the best-known implementation of the SM3 algorithm on GPU platforms. Without transmission time counting, the peak performance is near 8500 Gb/s. Compared with other SM3 GPU algorithms, the algorithm proposed in this paper significantly enhances the efficiency of digest generation. Furthermore, the results show a new conclusion that the optimization of logical operations in the SM3 algorithm has reached a very high extent and the data transmission of PCIE becomes the bottleneck in the CPU+GPU data processing mode. Therefore, future work on the optimization of the SM3 algorithm should pay more attention to the PCIE data transfer efficiency.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-024-06141-6/MediaObjects/11227_2024_6141_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-024-06141-6/MediaObjects/11227_2024_6141_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-024-06141-6/MediaObjects/11227_2024_6141_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-024-06141-6/MediaObjects/11227_2024_6141_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-024-06141-6/MediaObjects/11227_2024_6141_Fig5_HTML.png)
Similar content being viewed by others
References
Zheng X, Xu C, Hu X, Zhang Y, **ong X (2020) The software/hardware co-design and implementation of SM2/3/4 encryption/decryption and digital signature system. IEEE Trans Comput Aided Des Integr Circuits Syst 39(10):2055–2066. https://doi.org/10.1109/TCAD.2019.2939330
Zhou M, Ruan S, Liu J, Chen X, Yang M, Wang Q (2022) vtpm-sm: An application scheme of SM2/SM3/SM4 algorithms based on trusted computing in cloud environment. In: IEEE 15th International Conference on Cloud Computing, CLOUD 2022, Barcelona, Spain, July 10-16, 2022, pp. 351–356. https://doi.org/10.1109/CLOUD55607.2022.00058
Yang Y, Han S, **e P, Zhu Y, Ding Z, Hou S, Xu S, Zheng H (2022) Implementation and optimization of zero-knowledge proof circuit based on hash function SM3. Sensors 22(16):5951. https://doi.org/10.3390/S22165951
Xu Y, Han L, Yu Z, Che F (2022) Optimized design implementation and research of sm3 hash algorithm based on fpga. In: 2022 2nd International Conference on Computer Science and Blockchain (CCSB), pp. 111–117. https://doi.org/10.1109/CCSB58128.2022.00027
Huang X, Guo Z, Song M, Zeng X (2021) Accelerating the SM3 hash algorithm with CPU-FPGA co-designed architecture. IET Comput Digit Tech 15(6):427–436. https://doi.org/10.1049/CDT2.12034
Zang S, Zhao D, Hu Y, Hu X, Gao Y, Du P, Cheng S (2021) A high speed sm3 algorithm implementation for security chip. In: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 915–919. https://doi.org/10.1109/IAEAC50856.2021.9390790
Zou J, Li L, Wei Z, Luo Y, Liu Q, Wu W (2022) New quantum circuit implementations of SM4 and SM3. Quantum Inf Process 21(5):181. https://doi.org/10.1007/S11128-022-03518-5
Tian H, Li Y, Wang Y, Peng T, Shi S, Qiu W (2019) Optimized password recovery based on gpus for SM3 algorithm. In: Proceedings of the 3rd International Conference on Computer Science and Application Engineering, CSAE 2019, Sanya, China, October 22-24, 2019, pp 148–11485. https://doi.org/10.1145/3331453.3361632
Sun S, Zhang R, Ma H (2021) Hashing multiple messages with SM3 on GPU platforms. China Inf. Sci, Sci. https://doi.org/10.1007/S11432-018-9648-X
Song G, Jang K, Kim H, Lee W, Hu Z, Seo H (2021) Grover on SM3. In: Information Security and Cryptology - ICISC 2021 - 24th International Conference, Seoul, South Korea, December 1-3, 2021, Revised Selected Papers. Lecture Notes in Computer Science, vol. 13218, pp 421–433. https://doi.org/10.1007/978-3-031-08896-4_22
NVIDIA (2023) CUDA C++ Programming Guide V12.0. Website. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Zhang S, Meng H, Li X, Liu W, Liu B (2021) Hunion traceability: A new type of blockchain traceability system based on sm2, SM3 and SM4. In: ICBTA 2021: 4th International Conference on Blockchain Technology and Applications, **’an, China, December 17 - 19, 2021, pp 107–115. https://doi.org/10.1145/3510487.3510503
Choi H, Seo SC (2021) Fast implementation of SHA-3 in GPU environment. IEEE Access 9:144574–144586. https://doi.org/10.1109/ACCESS.2021.3122466
Yuan Y, Qu K, Wu L, Ma J, Zhang X (2019) Correlation power attack on a message authentication code based on SM3. Frontiers Inf Technol Electron Eng 20(7):930–945. https://doi.org/10.1631/FITEE.1800312
Davendra D, Metlicka M, Bialic-Davendra M (2023) CUDA implementation of the antlion optimization algorithm. Int J Parallel Emerg Distrib Syst 38(2):118–139. https://doi.org/10.1080/17445760.2023.2172576
Chen G (2009) Study on parallel computing. In: Deng, X., Hopcroft, J.E., Xue, J. (eds.) Frontiers in Algorithmics, Third International Workshop, FAW 2009, Hefei, China, June 20-23, 2009. Proceedings. Lecture Notes in Computer Science, vol. 5598, p 1. https://doi.org/10.1007/978-3-642-02270-8_1
Yin F, Shi F (2022) A comparative survey of big data computing and HPC: from a parallel programming model to a cluster architecture. Int J Parallel Progr 50(1):27–64. https://doi.org/10.1007/S10766-021-00717-Y
Liu Y, Zhao R, Han L, **e J (2022) Research and implementation of parallel optimization of sm3 algorithm based on multithread. In: 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), pp 330–336. https://doi.org/10.1109/ICSP54964.2022.9778455
Zhang K, Zhang H, Cheng Q, Chen X, Wang Z, Liu Z (2023) A customized two-stage parallel computing algorithm for solving the combined modal split and traffic assignment problem. Comput Oper Res 154:106193. https://doi.org/10.1016/J.COR.2023.106193
Dong J, Lu S, Zhang P, Zheng F, **ao F (2022) G-SM3: high-performance implementation of gpu-based SM3 hash function. In: 28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022, Nan**g, China, January 10-12, 2023, pp 201–208. https://doi.org/10.1109/ICPADS56603.2022.00034
Cicirelli F, Giordano A, Mastroianni C (2021) Analysis of global and local synchronization in parallel computing. IEEE Trans Parallel Distrib Syst 32(5):988–1000. https://doi.org/10.1109/TPDS.2020.3037469
Li F, Zou F, Rao J (2023) A multi-gpu and cuda-aware mpi-based spectral element formulation for ultrasonic wave propagation in solid media. Ultrasonics 134:107049. https://doi.org/10.1016/j.ultras.2023.107049
Pang W, Luo X, Chen K, Ji D, Qiao L, Yi W (2023) Efficient CUDA stream management for multi-dnn real-time inference on embedded gpus. J Syst Archit 139:102888. https://doi.org/10.1016/J.SYSARC.2023.102888
**ao C, Zhao G, Zhang L, Ding D (2023) A controllable pipeline framework of block ciphers on GPU for streaming data. IEEE Access 11:93980–93993. https://doi.org/10.1109/ACCESS.2023.3310401
Hrbek V, Brandejský T (2023) Memetic algorithm with gpu optimization. Data Sci Algorithms Syst. https://doi.org/10.1007/978-3-031-21438-7_15
Jeshani T (2023) Dynamically finding optimal kernel launch parameters for cuda programs. https://api.semanticscholar.org/CorpusID:260061584
Guo H, Yue Y, Bo M, Liu Y, Fu Y, Shang J (2022) Transplantation and optimization of gpu-oriented sm3 cryptographic hash algorithm. In: Other Conferences.https://doi.org/10.1117/12.2640754
Author information
Authors and Affiliations
Contributions
H.: Methodology, Writing - Original Draft, Review& Editing. P.: Formal analysis, Validation. Z.: Software and Investigation. H.P. and Z. all authors contributed to the final version of the manuscript. P. Z. supervised the project.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, J., Peng, T. & Zhang, X. A CUDA-based parallel optimization method for SM3 hash algorithm. J Supercomput (2024). https://doi.org/10.1007/s11227-024-06141-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06141-6