Develo** an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA

  • Conference paper
  • First Online:
Parallel Computational Technologies (PCT 2020)

Abstract

Breadth-First Search (BFS) is an important computational kernel used as a building-block for many other graph algorithms. Different algorithms and implementation approaches aimed to solve the BFS problem have been proposed so far for various computational platforms, with the direction-optimizing algorithm being the fastest and the most computationally efficient for many real-world graph types. However, straightforward implementation of direction-optimizing BFS for vector computers can be extremely challenging and inefficient due to the high irregularity of graph data structure and the algorithm itself. This paper describes the world’s first attempt aimed to create an efficient vector-friendly BFS implementation of the direction-optimizing algorithm for NEC SX-Aurora TSUBASA architecture. SX-Aurora TSUBASA vector processors provide high-performance computational power together with a world-highest bandwidth memory, making it a very interesting platform for solving various graph-processing problems. The implementation proposed in this paper significantly outperforms the existing state-of-the-art implementations both for modern CPUs (Intel Skylake) and NVIDIA V100 GPUs. In addition, the proposed implementation achieves significantly higher energy efficiency compared to other platforms and implementations both in terms of average power consumption and achieved performance per watt.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 55.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Afanasyev, I.V., et al.: Develo** efficient implementations of bellman-ford and forward-backward graph algorithms for nec SX-ACE. Supercomput. Front. Innov. 5(3), 65–69 (2018)

    Google Scholar 

  2. Afanasyev, I.V., Voevodin, V.V., Voevodin, V.V., Komatsu, K., Kobayashi, H.: Analysis of relationship between SIMD-processing features used in NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors. In: Malyshkin, V. (ed.) PaCT 2019. LNCS, vol. 11657, pp. 125–139. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25636-4_10

    Chapter  Google Scholar 

  3. Beamer, S., Asanović, K., Patterson, D.: Direction-optimizing breadth-first search. Sci. Program. 21(3–4), 137–148 (2013)

    Google Scholar 

  4. Beamer, S., Asanović, K., Patterson, D.: The gap benchmark suite. ar**v preprint ar**v:1508.03619 (2015)

  5. Besta, M., Marending, F., Solomonik, E., Hoefler, T.: Slimsell: a vectorizable graph representation for breadth-first search. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 32–41. IEEE (2017)

    Google Scholar 

  6. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)

    Google Scholar 

  7. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)

    Article  MathSciNet  Google Scholar 

  8. Egawa, R., et al.: Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE. J. Supercomput. 73(9), 3948–3976 (2017)

    Article  Google Scholar 

  9. Fu, Z., Personick, M., Thompson, B.: MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, pp. 1–6. ACM (2014)

    Google Scholar 

  10. Hiragushi, T., Takahashi, D.: Efficient hybrid breadth-first search on GPUs. In: Aversa, R., Kołodziej, J., Zhang, J., Amato, F., Fortino, G. (eds.) ICA3PP 2013. LNCS, vol. 8286, pp. 40–50. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03889-6_5

    Chapter  Google Scholar 

  11. Khorasani, F., Vora, K., Gupta, R., Bhuyan, L.N.: CuSHA: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 239–252. ACM (2014)

    Google Scholar 

  12. Komatsu, K., Egawa, R., Isobe, Y., Ogata, R., Takizawa, H., Kobayashi, H.: An approach to the highest efficiency of the HPCG benchmark on the SX-ACE supercomputer. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC15), Poster, pp. 1–2 (2015)

    Google Scholar 

  13. Komatsu, K., et al.: Performance evaluation of a vector supercomputer SX-Aurora TSUBASA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, pp. 54:1–54:12. IEEE Press, Piscataway (2018)

    Google Scholar 

  14. Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 1–12. IEEE (2015)

    Google Scholar 

  15. Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. In: ACM Sigplan Notices, vol. 47, pp. 117–128. ACM (2012)

    Google Scholar 

  16. Nguyen, D., Lenharth, A., **ali, K.: A lightweight infrastructure for graph analytics. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 456–471. ACM (2013)

    Google Scholar 

  17. Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: ACM Sigplan Notices, vol. 48, pp. 135–146. ACM (2013)

    Google Scholar 

  18. Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: ACM SIGPLAN Notices, vol. 51, p. 11. ACM (2016)

    Google Scholar 

  19. Yamada, Y., Momose, S.: Vector engine processor of NECs brand-new supercomputer SX-Aurora TSUBASA. In: International symposium on High Performance Chips (Hot Chips 2018) (2018)

    Google Scholar 

  20. Yasui, Y., Fujisawa, K., Sato, Y.: Fast and energy-efficient breadth-first search on a single NUMA system. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 365–381. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_23

    Chapter  Google Scholar 

  21. Zhang, Y., Hansen, E.A.: Parallel breadth-first heuristic search on a shared-memory architecture. In: AAAI-06 Workshop on Heuristic Search, Memory-Based Heuristics and Their Applications (2006)

    Google Scholar 

Download references

Acknowledgement

This project was partially supported by JSPS Bilateral Joint Research Projects program, entitled “Theory and Practice of Vector Data Processing at Extreme Scale: Back to the Future” and by MEXT Next Generation High-Performance Computing Infrastructures and Applications R&D Program, entitled “R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications”. The reported study was funded by RFBR, project number 19-31-27001. The reported study was supported by the Russian Foundation for Basic Research, project No. 18-29-03230. The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilya V. Afanasyev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Afanasyev, I.V., Voevodin, V.V., Komatsu, K., Kobayashi, H. (2020). Develo** an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2020. Communications in Computer and Information Science, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-55326-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-55326-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-55325-8

  • Online ISBN: 978-3-030-55326-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation