Develo** an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA

Afanasyev, Ilya V.; Voevodin, Vladimir V.; Komatsu, Kazuhiko; Kobayashi, Hiroaki

doi:10.1007/978-3-030-55326-5_10

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1263))

Included in the following conference series:

International Conference on Parallel Computational Technologies

357 Accesses
2 Citations

Abstract

Breadth-First Search (BFS) is an important computational kernel used as a building-block for many other graph algorithms. Different algorithms and implementation approaches aimed to solve the BFS problem have been proposed so far for various computational platforms, with the direction-optimizing algorithm being the fastest and the most computationally efficient for many real-world graph types. However, straightforward implementation of direction-optimizing BFS for vector computers can be extremely challenging and inefficient due to the high irregularity of graph data structure and the algorithm itself. This paper describes the world’s first attempt aimed to create an efficient vector-friendly BFS implementation of the direction-optimizing algorithm for NEC SX-Aurora TSUBASA architecture. SX-Aurora TSUBASA vector processors provide high-performance computational power together with a world-highest bandwidth memory, making it a very interesting platform for solving various graph-processing problems. The implementation proposed in this paper significantly outperforms the existing state-of-the-art implementations both for modern CPUs (Intel Skylake) and NVIDIA V100 GPUs. In addition, the proposed implementation achieves significantly higher energy efficiency compared to other platforms and implementations both in terms of average power consumption and achieved performance per watt.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture

Article 26 January 2021

Develo** Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture

Article 27 November 2019

Develo** Efficient Implementations of Connected Component Algorithms for NEC SX-Aurora TSUBASA

Article 01 August 2020

References

Afanasyev, I.V., et al.: Develo** efficient implementations of bellman-ford and forward-backward graph algorithms for nec SX-ACE. Supercomput. Front. Innov. 5(3), 65–69 (2018)
Google Scholar
Afanasyev, I.V., Voevodin, V.V., Voevodin, V.V., Komatsu, K., Kobayashi, H.: Analysis of relationship between SIMD-processing features used in NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors. In: Malyshkin, V. (ed.) PaCT 2019. LNCS, vol. 11657, pp. 125–139. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25636-4_10
Chapter Google Scholar
Beamer, S., Asanović, K., Patterson, D.: Direction-optimizing breadth-first search. Sci. Program. 21(3–4), 137–148 (2013)
Google Scholar
Beamer, S., Asanović, K., Patterson, D.: The gap benchmark suite. ar**v preprint ar**v:1508.03619 (2015)
Besta, M., Marending, F., Solomonik, E., Hoefler, T.: Slimsell: a vectorizable graph representation for breadth-first search. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 32–41. IEEE (2017)
Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)
Google Scholar
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)
Article MathSciNet Google Scholar
Egawa, R., et al.: Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE. J. Supercomput. 73(9), 3948–3976 (2017)
Article Google Scholar
Fu, Z., Personick, M., Thompson, B.: MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, pp. 1–6. ACM (2014)
Google Scholar
Hiragushi, T., Takahashi, D.: Efficient hybrid breadth-first search on GPUs. In: Aversa, R., Kołodziej, J., Zhang, J., Amato, F., Fortino, G. (eds.) ICA3PP 2013. LNCS, vol. 8286, pp. 40–50. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03889-6_5
Chapter Google Scholar
Khorasani, F., Vora, K., Gupta, R., Bhuyan, L.N.: CuSHA: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 239–252. ACM (2014)
Google Scholar
Komatsu, K., Egawa, R., Isobe, Y., Ogata, R., Takizawa, H., Kobayashi, H.: An approach to the highest efficiency of the HPCG benchmark on the SX-ACE supercomputer. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC15), Poster, pp. 1–2 (2015)
Google Scholar
Komatsu, K., et al.: Performance evaluation of a vector supercomputer SX-Aurora TSUBASA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, pp. 54:1–54:12. IEEE Press, Piscataway (2018)
Google Scholar
Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 1–12. IEEE (2015)
Google Scholar
Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. In: ACM Sigplan Notices, vol. 47, pp. 117–128. ACM (2012)
Google Scholar
Nguyen, D., Lenharth, A., **ali, K.: A lightweight infrastructure for graph analytics. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 456–471. ACM (2013)
Google Scholar
Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: ACM Sigplan Notices, vol. 48, pp. 135–146. ACM (2013)
Google Scholar
Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: ACM SIGPLAN Notices, vol. 51, p. 11. ACM (2016)
Google Scholar
Yamada, Y., Momose, S.: Vector engine processor of NECs brand-new supercomputer SX-Aurora TSUBASA. In: International symposium on High Performance Chips (Hot Chips 2018) (2018)
Google Scholar
Yasui, Y., Fujisawa, K., Sato, Y.: Fast and energy-efficient breadth-first search on a single NUMA system. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 365–381. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_23
Chapter Google Scholar
Zhang, Y., Hansen, E.A.: Parallel breadth-first heuristic search on a shared-memory architecture. In: AAAI-06 Workshop on Heuristic Search, Memory-Based Heuristics and Their Applications (2006)
Google Scholar

Download references

Acknowledgement

This project was partially supported by JSPS Bilateral Joint Research Projects program, entitled “Theory and Practice of Vector Data Processing at Extreme Scale: Back to the Future” and by MEXT Next Generation High-Performance Computing Infrastructures and Applications R&D Program, entitled “R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications”. The reported study was funded by RFBR, project number 19-31-27001. The reported study was supported by the Russian Foundation for Basic Research, project No. 18-29-03230. The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University.

Author information

Authors and Affiliations

Research Computing Center of Moscow State University, Moscow, 119234, Russia
Ilya V. Afanasyev & Vladimir V. Voevodin
Tohoku University, Sendai, Miyagi, 980-8579, Japan
Kazuhiko Komatsu & Hiroaki Kobayashi

Authors

Ilya V. Afanasyev
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir V. Voevodin
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiko Komatsu
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilya V. Afanasyev .

Editor information

Editors and Affiliations

South Ural State University, Chelyabinsk, Russia
Leonid Sokolinsky
South Ural State University, Chelyabinsk, Russia
Mikhail Zymbler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afanasyev, I.V., Voevodin, V.V., Komatsu, K., Kobayashi, H. (2020). Develo** an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2020. Communications in Computer and Information Science, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-55326-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-55326-5_10
Published: 26 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55325-8
Online ISBN: 978-3-030-55326-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Develo** an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture

Develo** Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture

Develo** Efficient Implementations of Connected Component Algorithms for NEC SX-Aurora TSUBASA

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Develo** an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture

Develo** Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture

Develo** Efficient Implementations of Connected Component Algorithms for NEC SX-Aurora TSUBASA

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation