Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing

Hanawa, Toshihiro; Kodama, Yuetsu; Boku, Taisuke; Amano, Hideharu; Murai, Hitoshi; Umemura, Masayuki; Sato, Mitsuhisa

doi:10.1007/978-3-319-16214-0_43

Toshihiro Hanawa¹⁷,
Yuetsu Kodama¹⁸,
Taisuke Boku¹⁸,
Hideharu Amano¹⁹,
Hitoshi Murai²⁰,
Masayuki Umemura¹⁸ &
…
Mitsuhisa Sato¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9040))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

4050 Accesses

Abstract

Heterogeneous clusters using accelerators are widely used for high-performance computing system. In such systems, the inter-node communication among accelerators becomes bottleneck due to the data transfer between the accelerator and the host.

To eliminate this overhead, we have been develo** a novel communication system realizing direct communication among accelerators over computation nodes under the HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) project. Also we are investigating high-level parallel programming language, and several practical application programs on our concept, as well as studying the enhancement of TCA and develo** system software stack in the CREST project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Exploring high-performance processor architecture beyond the exascale

Article 01 October 2018

Data Parallel Application Adaptivity and System-Wide Resource Management in Many-Core Architectures

Accelerate Communication, not Computation!

References

OpenACC. http://www.openacc-standard.org
PGAS - Partitioned Global Address Space Languages. http://www.pgas.org
QUDA - A Library for QCD on GPUs. http://lattice.github.io/quda/
XcalableMP Specification Version 1.2, November 2012. http://www.xcalablemp.org/spec/xmp-spec-1.2.pdf
Altera Corp.: Stratix IV Device Handbook. http://www.altera.co.jp/literature/lit-stratix-iv.jsp
Amano, H., Kuhara, T., Kaneda, T., Hanawa, T., Kodama, Y., Boku, T.: A preliminarily evaluation of PEACH3: a switching hub for tightly coupled accelerators. In: Proc. of 2nd International Workshop on Computer Systems and Architectures (CSA 2014), in Conjunction with the 2nd International Symposium on Computing and Networking (CANDAR 2014), December 2014
Google Scholar
Ammendola, R., et al.: APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters. Journal of Physics 331(Part 5) (2011)
Google Scholar
Clark, M.A., Babich, R., Barros, K., Brower, R.C., Rebbi, C.: Solving lattice QCD systems of equations using mixed precision solvers on GPUs. Comput. Phys. Commun. 181, 1517–1528 (2010)
Article MATH Google Scholar
Dongarra, J., Meuer, H., Stromaier, E., Simon, H.: Top500 list. http://www.top500.org
Feng, W.C., Cameron, K.W.: Green500 list. http://www.green500.org
Fujita, N., Fujii, H., Hanawa, T., Kodama, Y., Boku, T., Kuramashi, Y., Clark, M.: QCD library for GPU cluster with proprietary interconnect for GPU direct communication. In: Lopes, L., et al. (eds.) Euro-Par 2014, Part I. LNCS, vol. 8805, pp. 251–262. Springer, Heidelberg (2014)
Chapter Google Scholar
Gudmundson, J.: Enabling multi-host system designs with PCI Express technology, May 2004. http://www.plxtech.com/products/expresslane/techinfo
Hanawa, T., Kodama, Y., Boku, T., Sato, M.: Interconnect for tightly coupled accelerators architecture. In: Proc. of IEEE 21st Annual Sympsium on High-Performance Interconnects (HOT Interconnects 21), pp. 79–82, August 2013
Google Scholar
Hanawa, T., Kodama, Y., Boku, T., Sato, M.: Tightly coupled accelerators architecture for minimizing communication latency among accelerators. In: The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES2013) in Conjunction with IPDPS, pp. 1030–1039, May 2013
Google Scholar
Kodama, Y., Hanawa, T., Boku, T., Sato, M.: PEACH2: FPGA based PCIe network device for tightly coupled accelerators. ACM SIGARCH Computer Architecture News 42(4), 3–8 (2014)
Article Google Scholar
Mellanox Technologies: Mellanox OFED GPUDirect. http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=116&menu_section=34
Nakao, M., Lee, J., Boku, T., Sato, M.: Productivity and performance of global-view programming with XcalableMP PGAS language. In: The 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 402–409, May 2012
Google Scholar
Nakao, M., Murai, H., Shimosaka, T., Tabuchi, A., Hanawa, T., Kodama, Y., Boku, T., Sato, M.: XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters. In: Proc. of Workshop on a Accelerator Programming Using Directives (WACCPD 2014), in Conjunction with SC14, pp. 27–36, November 2014
Google Scholar
NVIDIA Corp.: Develo** A Linux Kernel Module Using RDMA For GPUDirect. http://developer.download.nvidia.com/compute/cuda/5_0/rc/docs/GPUDirect_RDMA.pdf
NVIDIA Corp.: NVIDIA GPUDirect. http://developer.nvidia.com/gpudirect
PCI-SIG: PCI Express Card Electromechanical (CEM) Specification, Rev. 2.0, April 2007
Google Scholar
PCI-SIG: PCI Express Base Specification, Rev. 3.0, November 2010
Google Scholar
Rossetti, D., et al.: Leveraging NVIDIA GPUDirect on APEnet+ 3D torus cluster interconnect, May 2012. http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0282-GTC2012-GPU-Torus-Cluster.pdf

Download references

Author information

Authors and Affiliations

Information Technology Center, The University of Tokyo, Kashiwa, Japan
Toshihiro Hanawa
Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
Yuetsu Kodama, Taisuke Boku, Masayuki Umemura & Mitsuhisa Sato
Department of Information and Computer Science, Keio University, Yokohama, Japan
Hideharu Amano
RIKEN Advanced Institute for Computational Science, Kobe, Japan
Hitoshi Murai

Authors

Toshihiro Hanawa
View author publications
You can also search for this author in PubMed Google Scholar
Yuetsu Kodama
View author publications
You can also search for this author in PubMed Google Scholar
Taisuke Boku
View author publications
You can also search for this author in PubMed Google Scholar
Hideharu Amano
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Murai
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Umemura
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuhisa Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshihiro Hanawa .

Editor information

Editors and Affiliations

Tohoku University, Sendai, Japan
Kentaro Sano
National Technical University of Athens, Athens, Greece
Dimitrios Soudris
Ruhr-Universität Bochum, Bochum, Germany
Michael Hübner
University of Southern California, Marina del Rey, California, USA
Pedro C. Diniz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanawa, T. et al. (2015). Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing. In: Sano, K., Soudris, D., Hübner, M., Diniz, P. (eds) Applied Reconfigurable Computing. ARC 2015. Lecture Notes in Computer Science(), vol 9040. Springer, Cham. https://doi.org/10.1007/978-3-319-16214-0_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-16214-0_43
Published: 31 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16213-3
Online ISBN: 978-3-319-16214-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Exploring high-performance processor architecture beyond the exascale

Data Parallel Application Adaptivity and System-Wide Resource Management in Many-Core Architectures

Accelerate Communication, not Computation!

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Exploring high-performance processor architecture beyond the exascale

Data Parallel Application Adaptivity and System-Wide Resource Management in Many-Core Architectures

Accelerate Communication, not Computation!

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation