Towards a Deep-Pipelined Architecture for Accelerating Deep GCN on a Multi-FPGA Platform

Cheng, Qixuan; Wen, Mei; Shen, Junzhong; Wang, Deguang; Zhang, Chunyuan

doi:10.1007/978-3-030-60245-1_36

Qixuan Cheng⁹,
Mei Wen⁹,
Junzhong Shen⁹,
Deguang Wang⁹ &
…
Chunyuan Zhang⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12452))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1777 Accesses
2 Citations

Abstract

CNN (convolutional neural networks) have achieved great success in learning features from Euclidean-structured data. While lots of learning tasks require dealing with graph data. In these application scenarios where CNN cannot operate, GCN (graph neural networks) have shown appealing performance and increasing attention in recent years. However, according to our research, the computational complexity and storage overhead of the network also increase, making it a challenge to accelerate on a single FPGA. Accordingly, in this work, we focus on accelerating a deep GCN (DAGCN) on a CPU-multi FPGA platform by proposing a deep-pipelined acceleration scheme. To fully explore the parallelism that exists in DAGCN, we propose a graph convolutional neural accelerator (GCNAR) characterized by integration of a multiple 1-D systolic array. In addition, we also adopt an existing CSR algorithm-based partitioning scheme for large-scale matrix-vector multiplication in the design of our GCNAR, which effectively improves the computational efficiency of GCNAR. Moreover, we develop performance and resource evaluation models to help us determine the optimal design parameters for maximizing the accelerator throughput. Evaluation on real-world graph datasets demonstrates that our FPGA-based solution can achieve comparable performance to state-of-the-art GCN accelerations. In addition, compared to CPU and GPU solutions, our accelerator can achieve 196 times and 115 times the improvement for graph classification respectively in terms of processing latency.

Supported by Supported by organization National Natural Science Foundation of China (NSFC) project 61802420 and National Program on Key Basic Research Project 2016YFB1000401 and 2016YFB1000403.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Research on FPGA Accelerator Optimization Based on Graph Neural Network

DRGN: a dynamically reconfigurable accelerator for graph neural networks

Article 13 September 2022

Optimizing Sparse Matrix Multiplications for Graph Neural Networks

References

Aliyun: Ali FPGA cloud service. https://www.aliyun.com/product/ecs/fpga
Dou, Y., Vassiliadis, S., Kuzmanov, G.K., Gaydadjiev, G.N.: 64-bit floating-point FPGA matrix multiplication. In: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, pp. 86–95. ACM, New York (2005)
Google Scholar
Geng, T., et al.: UWB-GCN: hardware acceleration of graph-convolution-network through runtime workload rebalancing. ar**v preprint ar**v:1908.10834 (2019)
Geng, T., et al.: FPDeep: acceleration and load balancing of CNN training on FPGA clusters. In: 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 81–84. IEEE (2018)
Google Scholar
Guo, K., et al.: Compressed CNN training with FPGA-based accelerator. In: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 189–189. ACM (2019)
Google Scholar
Nakahara, H., **guji, A., Shimoda, M., Sato, S.: An FPGA-based fine tuning accelerator for a sparse CNN. In: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 186–186. ACM (2019)
Google Scholar
Prithviraj, S., Galileo, N., Mustafa, B., Lise, G.: Collective classification in network data. Technical report CS-TR-4905 and UMIACS-TR-2008-04, University of Maryland, College Park, Washington, USA (2008)
Google Scholar
Sigurbergsson, B., Hogervorst, T., Qiu, T.D., Nane, R.: Sparstition: a partitioning scheme for large-scale sparse matrix vector multiplication on FPGA. In: 2019 IEEE 30th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), vol. 2160, pp. 51–58. IEEE (2019)
Google Scholar
Venkataramanaiah, S.K., et al.: Automatic compiler based FPGA accelerator for CNN training. In: 2019 29th International Conference on Field Programmable Logic and Applications (FPL), pp. 166–172. IEEE (2019)
Google Scholar
XILINX: **linx virtex ultrascale+ FPGA VCU118 evaluation kit. https://www.xilinx.com/products/boards-and-kits/vcu118.html#hardware
Yan, M., et al.: HyGCN: a GCN accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 15–29. IEEE (2020)
Google Scholar
Yang, L., Chen, Z., Gu, J., Guo, Y.: Dual self-paced graph convolutional network: towards reducing attribute distortions induced by topology. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-2019), pp. 4062–4069 (2019)
Google Scholar
Zeng, H., Prasanna, V.: GraphACT: accelerating GCN training on CPU-FPGA heterogeneous platforms. In: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 255–265 (2020)
Google Scholar
Zhao, W., et al.: F-CNN: an FPGA-based framework for training convolutional neural networks. In: 2016 IEEE 27th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 107–114 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, National University of Defense Technology, Changsha, Hunan, China
Qixuan Cheng, Mei Wen, Junzhong Shen, Deguang Wang & Chunyuan Zhang

Authors

Qixuan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Mei Wen
View author publications
You can also search for this author in PubMed Google Scholar
Junzhong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Deguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mei Wen .

Editor information

Editors and Affiliations

Columbia University, New York, NY, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, Q., Wen, M., Shen, J., Wang, D., Zhang, C. (2020). Towards a Deep-Pipelined Architecture for Accelerating Deep GCN on a Multi-FPGA Platform. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12452. Springer, Cham. https://doi.org/10.1007/978-3-030-60245-1_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-60245-1_36
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60244-4
Online ISBN: 978-3-030-60245-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Towards a Deep-Pipelined Architecture for Accelerating Deep GCN on a Multi-FPGA Platform

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Research on FPGA Accelerator Optimization Based on Graph Neural Network

DRGN: a dynamically reconfigurable accelerator for graph neural networks

Optimizing Sparse Matrix Multiplications for Graph Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards a Deep-Pipelined Architecture for Accelerating Deep GCN on a Multi-FPGA Platform

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Research on FPGA Accelerator Optimization Based on Graph Neural Network

DRGN: a dynamically reconfigurable accelerator for graph neural networks

Optimizing Sparse Matrix Multiplications for Graph Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation