Abstract
Breadth-First Search (BFS) is an important computational kernel used as a building-block for many other graph algorithms. Different algorithms and implementation approaches aimed to solve the BFS problem have been proposed so far for various computational platforms, with the direction-optimizing algorithm being the fastest and the most computationally efficient for many real-world graph types. However, straightforward implementation of direction-optimizing BFS for vector computers can be extremely challenging and inefficient due to the high irregularity of graph data structure and the algorithm itself. This paper describes the world’s first attempt aimed to create an efficient vector-friendly BFS implementation of the direction-optimizing algorithm for NEC SX-Aurora TSUBASA architecture. SX-Aurora TSUBASA vector processors provide high-performance computational power together with a world-highest bandwidth memory, making it a very interesting platform for solving various graph-processing problems. The implementation proposed in this paper significantly outperforms the existing state-of-the-art implementations both for modern CPUs (Intel Skylake) and NVIDIA V100 GPUs. In addition, the proposed implementation achieves significantly higher energy efficiency compared to other platforms and implementations both in terms of average power consumption and achieved performance per watt.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afanasyev, I.V., et al.: Develo** efficient implementations of bellman-ford and forward-backward graph algorithms for nec SX-ACE. Supercomput. Front. Innov. 5(3), 65–69 (2018)
Afanasyev, I.V., Voevodin, V.V., Voevodin, V.V., Komatsu, K., Kobayashi, H.: Analysis of relationship between SIMD-processing features used in NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors. In: Malyshkin, V. (ed.) PaCT 2019. LNCS, vol. 11657, pp. 125–139. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25636-4_10
Beamer, S., Asanović, K., Patterson, D.: Direction-optimizing breadth-first search. Sci. Program. 21(3–4), 137–148 (2013)
Beamer, S., Asanović, K., Patterson, D.: The gap benchmark suite. ar**v preprint ar**v:1508.03619 (2015)
Besta, M., Marending, F., Solomonik, E., Hoefler, T.: Slimsell: a vectorizable graph representation for breadth-first search. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 32–41. IEEE (2017)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)
Egawa, R., et al.: Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE. J. Supercomput. 73(9), 3948–3976 (2017)
Fu, Z., Personick, M., Thompson, B.: MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, pp. 1–6. ACM (2014)
Hiragushi, T., Takahashi, D.: Efficient hybrid breadth-first search on GPUs. In: Aversa, R., Kołodziej, J., Zhang, J., Amato, F., Fortino, G. (eds.) ICA3PP 2013. LNCS, vol. 8286, pp. 40–50. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03889-6_5
Khorasani, F., Vora, K., Gupta, R., Bhuyan, L.N.: CuSHA: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 239–252. ACM (2014)
Komatsu, K., Egawa, R., Isobe, Y., Ogata, R., Takizawa, H., Kobayashi, H.: An approach to the highest efficiency of the HPCG benchmark on the SX-ACE supercomputer. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC15), Poster, pp. 1–2 (2015)
Komatsu, K., et al.: Performance evaluation of a vector supercomputer SX-Aurora TSUBASA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, pp. 54:1–54:12. IEEE Press, Piscataway (2018)
Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 1–12. IEEE (2015)
Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. In: ACM Sigplan Notices, vol. 47, pp. 117–128. ACM (2012)
Nguyen, D., Lenharth, A., **ali, K.: A lightweight infrastructure for graph analytics. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 456–471. ACM (2013)
Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: ACM Sigplan Notices, vol. 48, pp. 135–146. ACM (2013)
Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: ACM SIGPLAN Notices, vol. 51, p. 11. ACM (2016)
Yamada, Y., Momose, S.: Vector engine processor of NECs brand-new supercomputer SX-Aurora TSUBASA. In: International symposium on High Performance Chips (Hot Chips 2018) (2018)
Yasui, Y., Fujisawa, K., Sato, Y.: Fast and energy-efficient breadth-first search on a single NUMA system. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 365–381. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_23
Zhang, Y., Hansen, E.A.: Parallel breadth-first heuristic search on a shared-memory architecture. In: AAAI-06 Workshop on Heuristic Search, Memory-Based Heuristics and Their Applications (2006)
Acknowledgement
This project was partially supported by JSPS Bilateral Joint Research Projects program, entitled “Theory and Practice of Vector Data Processing at Extreme Scale: Back to the Future” and by MEXT Next Generation High-Performance Computing Infrastructures and Applications R&D Program, entitled “R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications”. The reported study was funded by RFBR, project number 19-31-27001. The reported study was supported by the Russian Foundation for Basic Research, project No. 18-29-03230. The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Afanasyev, I.V., Voevodin, V.V., Komatsu, K., Kobayashi, H. (2020). Develo** an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2020. Communications in Computer and Information Science, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-55326-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-55326-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55325-8
Online ISBN: 978-3-030-55326-5
eBook Packages: Computer ScienceComputer Science (R0)