Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors

  • Conference paper
  • First Online:
Parallel Computing Technologies (PaCT 2019)

Abstract

This paper presents comprehensive analysis of main SIMD-processing features and computational characteristics of three high performance architectures: two NVIDIA GPU architectures (of Pascal and Volta generations) and NEC SX-Aurora TSUBASA vector processor. Since both these types of architectures strongly rely on using SIMD-processing features, certain similarities of data-processing principles can be found between them. However, despite having vectorised data-processing included in both NVIDIA GPU and NEC SX-Aurora TSUBASA architectures, vectorisation features of both architectures are implemented in completely different ways. These differences lead to several fundamental restrictions on classes of algorithms which can be efficiently implemented on corresponding platforms. This paper is devoted to the research of the possibility of porting various classes of programs and algorithms among the discussed architectures with a focus on utilising all vectorisation features available. However, without a detailed analysis of similar and different SIMD-processing features in these architectures, it is impossible to approach this problem. The performed analysis allowed us to identify several important examples of typical applications and algorithms. Some of them demonstrated comparable and the others showed different efficiency on NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors, including reduction operations, programs relying on frequent indirect memory accesses and data-transfers through co-processor interconnect. Moreover, the conducted analysis allows to easily extend this set of examples to approach the problem of automated porting of programs between the reviewed architectures, what we consider as an important direction of our future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 60.98
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 78.06
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. STREAM Benchmark. https://www.cs.virginia.edu/stream/

  2. Thrust Library. https://thrust.github.io

  3. Egawa, R., et al.: Potential of a modern vector supercomputer for practicalapplications: performance evaluation of SX-ACE. J. Supercomput. 73(9), 3948–3976 (2017). https://doi.org/10.1007/s11227-017-1993-y

    Article  Google Scholar 

  4. Flynn, M.J.: Very high-speed computing systems. Proc. IEEE 54(12), 1901–1909 (1966)

    Article  Google Scholar 

  5. Harris, M., et al.: Optimizing parallel reduction in CUDA. Nvidia Dev. Technol. 2(4), 70 (2007)

    Google Scholar 

  6. Komatsu, K., Egawa, R., Isobe, Y., Ogata, R., Takizawa, H., Kobayashi, H.: An approach to the highest efficiency of the HPCG benchmark on the SX-ACE supercomputer. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC15), Poster, pp. 1–2, November 2015

    Google Scholar 

  7. Komatsu, K., et al.: Performance evaluation of a vector supercomputer SX-aurora TSUBASA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, pp. 54:1–54:12. IEEE Press, Piscataway (2018). http://dl.acm.org/citation.cfm?id=3291656.3291728

  8. NVIDIA: Nvidia Tesla P100: The most advanced datacenter accelerator ever built featuring Pascal GP100, the world’s fastest GPU. Whitepaper (2016)

    Google Scholar 

  9. NVIDIA Tesla: V100 GPU architecture (2017)

    Google Scholar 

  10. Wu, B., Zhao, Z., Zhang, E.Z., Jiang, Y., Shen, X.: Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In: ACM SIGPLAN Notices, vol. 48, pp. 57–68. ACM (2013)

    Google Scholar 

  11. Yamada, Y., Momose, S.: Vector engine processor of NECs brand-new supercomputer SX-aurora TSUBASA. In: Intenational Symposium on High Performance Chips (Hot Chips 2018) (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilya V. Afanasyev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Afanasyev, I.V., Voevodin, V.V., Voevodin, V.V., Komatsu, K., Kobayashi, H. (2019). Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2019. Lecture Notes in Computer Science(), vol 11657. Springer, Cham. https://doi.org/10.1007/978-3-030-25636-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25636-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25635-7

  • Online ISBN: 978-3-030-25636-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation