Log in

A CUDA-based parallel optimization method for SM3 hash algorithm

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Hash algorithms are among the most crucial algorithms in cryptography. The SM3 algorithm is a hash cryptographic standard of China. Because of the strong collision resistance and irreversibility of hash algorithms, they are widely used as a basic function in various fields such as digital signatures and random number generation. With the increasing real-time applications of automation in the fields of finance and office, the network puts forward higher demands for the implementing efficiency of the SM3 algorithm. We present a CUDA-based parallel optimized method for SM3 algorithm by four different ways: They are Single data stream with Single thread (SS), Multiple data streams with Single thread (MS), Single data stream with Multi-thread (SM), and Multiple data streams with Multi-thread (MM). The experimental result shows MM is the best of the four. When considering the data transmission between CPU and GPU, the proposed optimized algorithm achieves a peak performance of 166.42 Gb/s, which is 1.96 times of the best-known implementation of the SM3 algorithm on GPU platforms. Without transmission time counting, the peak performance is near 8500 Gb/s. Compared with other SM3 GPU algorithms, the algorithm proposed in this paper significantly enhances the efficiency of digest generation. Furthermore, the results show a new conclusion that the optimization of logical operations in the SM3 algorithm has reached a very high extent and the data transmission of PCIE becomes the bottleneck in the CPU+GPU data processing mode. Therefore, future work on the optimization of the SM3 algorithm should pay more attention to the PCIE data transfer efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Zheng X, Xu C, Hu X, Zhang Y, **ong X (2020) The software/hardware co-design and implementation of SM2/3/4 encryption/decryption and digital signature system. IEEE Trans Comput Aided Des Integr Circuits Syst 39(10):2055–2066. https://doi.org/10.1109/TCAD.2019.2939330

    Article  Google Scholar 

  2. Zhou M, Ruan S, Liu J, Chen X, Yang M, Wang Q (2022) vtpm-sm: An application scheme of SM2/SM3/SM4 algorithms based on trusted computing in cloud environment. In: IEEE 15th International Conference on Cloud Computing, CLOUD 2022, Barcelona, Spain, July 10-16, 2022, pp. 351–356. https://doi.org/10.1109/CLOUD55607.2022.00058

  3. Yang Y, Han S, **e P, Zhu Y, Ding Z, Hou S, Xu S, Zheng H (2022) Implementation and optimization of zero-knowledge proof circuit based on hash function SM3. Sensors 22(16):5951. https://doi.org/10.3390/S22165951

    Article  Google Scholar 

  4. Xu Y, Han L, Yu Z, Che F (2022) Optimized design implementation and research of sm3 hash algorithm based on fpga. In: 2022 2nd International Conference on Computer Science and Blockchain (CCSB), pp. 111–117. https://doi.org/10.1109/CCSB58128.2022.00027

  5. Huang X, Guo Z, Song M, Zeng X (2021) Accelerating the SM3 hash algorithm with CPU-FPGA co-designed architecture. IET Comput Digit Tech 15(6):427–436. https://doi.org/10.1049/CDT2.12034

    Article  Google Scholar 

  6. Zang S, Zhao D, Hu Y, Hu X, Gao Y, Du P, Cheng S (2021) A high speed sm3 algorithm implementation for security chip. In: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 915–919. https://doi.org/10.1109/IAEAC50856.2021.9390790

  7. Zou J, Li L, Wei Z, Luo Y, Liu Q, Wu W (2022) New quantum circuit implementations of SM4 and SM3. Quantum Inf Process 21(5):181. https://doi.org/10.1007/S11128-022-03518-5

    Article  MathSciNet  Google Scholar 

  8. Tian H, Li Y, Wang Y, Peng T, Shi S, Qiu W (2019) Optimized password recovery based on gpus for SM3 algorithm. In: Proceedings of the 3rd International Conference on Computer Science and Application Engineering, CSAE 2019, Sanya, China, October 22-24, 2019, pp 148–11485. https://doi.org/10.1145/3331453.3361632

  9. Sun S, Zhang R, Ma H (2021) Hashing multiple messages with SM3 on GPU platforms. China Inf. Sci, Sci. https://doi.org/10.1007/S11432-018-9648-X

    Book  Google Scholar 

  10. Song G, Jang K, Kim H, Lee W, Hu Z, Seo H (2021) Grover on SM3. In: Information Security and Cryptology - ICISC 2021 - 24th International Conference, Seoul, South Korea, December 1-3, 2021, Revised Selected Papers. Lecture Notes in Computer Science, vol. 13218, pp 421–433. https://doi.org/10.1007/978-3-031-08896-4_22

  11. NVIDIA (2023) CUDA C++ Programming Guide V12.0. Website. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  12. Zhang S, Meng H, Li X, Liu W, Liu B (2021) Hunion traceability: A new type of blockchain traceability system based on sm2, SM3 and SM4. In: ICBTA 2021: 4th International Conference on Blockchain Technology and Applications, **’an, China, December 17 - 19, 2021, pp 107–115. https://doi.org/10.1145/3510487.3510503

  13. Choi H, Seo SC (2021) Fast implementation of SHA-3 in GPU environment. IEEE Access 9:144574–144586. https://doi.org/10.1109/ACCESS.2021.3122466

    Article  Google Scholar 

  14. Yuan Y, Qu K, Wu L, Ma J, Zhang X (2019) Correlation power attack on a message authentication code based on SM3. Frontiers Inf Technol Electron Eng 20(7):930–945. https://doi.org/10.1631/FITEE.1800312

    Article  Google Scholar 

  15. Davendra D, Metlicka M, Bialic-Davendra M (2023) CUDA implementation of the antlion optimization algorithm. Int J Parallel Emerg Distrib Syst 38(2):118–139. https://doi.org/10.1080/17445760.2023.2172576

    Article  Google Scholar 

  16. Chen G (2009) Study on parallel computing. In: Deng, X., Hopcroft, J.E., Xue, J. (eds.) Frontiers in Algorithmics, Third International Workshop, FAW 2009, Hefei, China, June 20-23, 2009. Proceedings. Lecture Notes in Computer Science, vol. 5598, p 1. https://doi.org/10.1007/978-3-642-02270-8_1

  17. Yin F, Shi F (2022) A comparative survey of big data computing and HPC: from a parallel programming model to a cluster architecture. Int J Parallel Progr 50(1):27–64. https://doi.org/10.1007/S10766-021-00717-Y

    Article  Google Scholar 

  18. Liu Y, Zhao R, Han L, **e J (2022) Research and implementation of parallel optimization of sm3 algorithm based on multithread. In: 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), pp 330–336. https://doi.org/10.1109/ICSP54964.2022.9778455

  19. Zhang K, Zhang H, Cheng Q, Chen X, Wang Z, Liu Z (2023) A customized two-stage parallel computing algorithm for solving the combined modal split and traffic assignment problem. Comput Oper Res 154:106193. https://doi.org/10.1016/J.COR.2023.106193

    Article  MathSciNet  Google Scholar 

  20. Dong J, Lu S, Zhang P, Zheng F, **ao F (2022) G-SM3: high-performance implementation of gpu-based SM3 hash function. In: 28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022, Nan**g, China, January 10-12, 2023, pp 201–208. https://doi.org/10.1109/ICPADS56603.2022.00034

  21. Cicirelli F, Giordano A, Mastroianni C (2021) Analysis of global and local synchronization in parallel computing. IEEE Trans Parallel Distrib Syst 32(5):988–1000. https://doi.org/10.1109/TPDS.2020.3037469

    Article  Google Scholar 

  22. Li F, Zou F, Rao J (2023) A multi-gpu and cuda-aware mpi-based spectral element formulation for ultrasonic wave propagation in solid media. Ultrasonics 134:107049. https://doi.org/10.1016/j.ultras.2023.107049

    Article  Google Scholar 

  23. Pang W, Luo X, Chen K, Ji D, Qiao L, Yi W (2023) Efficient CUDA stream management for multi-dnn real-time inference on embedded gpus. J Syst Archit 139:102888. https://doi.org/10.1016/J.SYSARC.2023.102888

    Article  Google Scholar 

  24. **ao C, Zhao G, Zhang L, Ding D (2023) A controllable pipeline framework of block ciphers on GPU for streaming data. IEEE Access 11:93980–93993. https://doi.org/10.1109/ACCESS.2023.3310401

    Article  Google Scholar 

  25. Hrbek V, Brandejský T (2023) Memetic algorithm with gpu optimization. Data Sci Algorithms Syst. https://doi.org/10.1007/978-3-031-21438-7_15

    Article  Google Scholar 

  26. Jeshani T (2023) Dynamically finding optimal kernel launch parameters for cuda programs. https://api.semanticscholar.org/CorpusID:260061584

  27. Guo H, Yue Y, Bo M, Liu Y, Fu Y, Shang J (2022) Transplantation and optimization of gpu-oriented sm3 cryptographic hash algorithm. In: Other Conferences.https://doi.org/10.1117/12.2640754

Download references

Author information

Authors and Affiliations

Authors

Contributions

H.: Methodology, Writing - Original Draft, Review& Editing. P.: Formal analysis, Validation. Z.: Software and Investigation. H.P. and Z. all authors contributed to the final version of the manuscript. P. Z. supervised the project.

Corresponding author

Correspondence to Xuesong Zhang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, J., Peng, T. & Zhang, X. A CUDA-based parallel optimization method for SM3 hash algorithm. J Supercomput (2024). https://doi.org/10.1007/s11227-024-06141-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06141-6

Keywords

Navigation