Abstract
This paper presents comprehensive analysis of main SIMD-processing features and computational characteristics of three high performance architectures: two NVIDIA GPU architectures (of Pascal and Volta generations) and NEC SX-Aurora TSUBASA vector processor. Since both these types of architectures strongly rely on using SIMD-processing features, certain similarities of data-processing principles can be found between them. However, despite having vectorised data-processing included in both NVIDIA GPU and NEC SX-Aurora TSUBASA architectures, vectorisation features of both architectures are implemented in completely different ways. These differences lead to several fundamental restrictions on classes of algorithms which can be efficiently implemented on corresponding platforms. This paper is devoted to the research of the possibility of porting various classes of programs and algorithms among the discussed architectures with a focus on utilising all vectorisation features available. However, without a detailed analysis of similar and different SIMD-processing features in these architectures, it is impossible to approach this problem. The performed analysis allowed us to identify several important examples of typical applications and algorithms. Some of them demonstrated comparable and the others showed different efficiency on NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors, including reduction operations, programs relying on frequent indirect memory accesses and data-transfers through co-processor interconnect. Moreover, the conducted analysis allows to easily extend this set of examples to approach the problem of automated porting of programs between the reviewed architectures, what we consider as an important direction of our future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
STREAM Benchmark. https://www.cs.virginia.edu/stream/
Thrust Library. https://thrust.github.io
Egawa, R., et al.: Potential of a modern vector supercomputer for practicalapplications: performance evaluation of SX-ACE. J. Supercomput. 73(9), 3948–3976 (2017). https://doi.org/10.1007/s11227-017-1993-y
Flynn, M.J.: Very high-speed computing systems. Proc. IEEE 54(12), 1901–1909 (1966)
Harris, M., et al.: Optimizing parallel reduction in CUDA. Nvidia Dev. Technol. 2(4), 70 (2007)
Komatsu, K., Egawa, R., Isobe, Y., Ogata, R., Takizawa, H., Kobayashi, H.: An approach to the highest efficiency of the HPCG benchmark on the SX-ACE supercomputer. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC15), Poster, pp. 1–2, November 2015
Komatsu, K., et al.: Performance evaluation of a vector supercomputer SX-aurora TSUBASA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, pp. 54:1–54:12. IEEE Press, Piscataway (2018). http://dl.acm.org/citation.cfm?id=3291656.3291728
NVIDIA: Nvidia Tesla P100: The most advanced datacenter accelerator ever built featuring Pascal GP100, the world’s fastest GPU. Whitepaper (2016)
NVIDIA Tesla: V100 GPU architecture (2017)
Wu, B., Zhao, Z., Zhang, E.Z., Jiang, Y., Shen, X.: Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In: ACM SIGPLAN Notices, vol. 48, pp. 57–68. ACM (2013)
Yamada, Y., Momose, S.: Vector engine processor of NECs brand-new supercomputer SX-aurora TSUBASA. In: Intenational Symposium on High Performance Chips (Hot Chips 2018) (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Afanasyev, I.V., Voevodin, V.V., Voevodin, V.V., Komatsu, K., Kobayashi, H. (2019). Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2019. Lecture Notes in Computer Science(), vol 11657. Springer, Cham. https://doi.org/10.1007/978-3-030-25636-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-25636-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25635-7
Online ISBN: 978-3-030-25636-4
eBook Packages: Computer ScienceComputer Science (R0)