Field-Programmable Gate Array Architecture

  • Living reference work entry
  • First Online:
Handbook of Computer Architecture
  • 400 Accesses

Abstract

Since their inception more than thirty years ago, field-programmable gate arrays (FPGAs) have grown more complex, more capable, and more diverse in their applications. FPGAs can be reprogrammed at a fundamental level, changing the function and interconnection of millions of elements. By reconfiguring their hardware to match the application, FPGAs often achieve higher energy efficiency, lower latency or faster time-to-market across a very wide range of application domains. A modern FPGA combines many components, from logic blocks, programmable routing and memory blocks to networks-on-chip and processor subsystems. For best efficiency, each component must be carefully architected to match the needs of a wide range of applications, and to mesh well with the other components. Their design involves many different choices starting from the high-level architectural parameters down to the transistor-level implementation details. This chapter describes the evolution of these FPGA components, their design principles and implementation challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  • Abdelfattah MS, Betz V (2013) The case for embedded networks on chip on field-programmable gate arrays. IEEE Micro 34(1):80–89

    Article  Google Scholar 

  • Abdelfattah MS et al (2015) Take the highway: design for embedded NoCs on FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 98–107

    Google Scholar 

  • Ahmed E, Rose J (2004) The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans Very Large Scale Integr (VLSI) Syst 12(3):288–298

    Article  Google Scholar 

  • Ahmed I et al (2019) FRoC 2.0: automatic BRAM and logic testing to enable dynamic voltage scaling for FPGA applications. ACM Trans Reconfig Technol Syst (TRETS) 12(4):1–28

    Article  Google Scholar 

  • Betz V, Rose J (1998) How much logic should go in an FPGA logic block? IEEE Des Test Comput 15(1):10–15

    Article  Google Scholar 

  • Betz V, Rose J (1999) FPGA routing architecture: segmentation and buffering to optimize speed and density. In: ACM International Symposium on FPGAs, pp 59–68

    Google Scholar 

  • Betz V et al (1999) Architecture and CAD for deep-submicron FPGAs. Springer Science & Business Media. New York, USA

    Book  Google Scholar 

  • Bohr MT (1995) Interconnect scaling – the real limiter to high performance ULSI. In: Proceedings of International Electron Devices Meeting. IEEE, pp 241–244

    Google Scholar 

  • Boutros A et al(2018) You cannot improve what you do not measure: FPGA vs. ASIC efficiency gaps for convolutional neural network inference. ACM Trans Reconfig Technol Syst (TRETS) 11(3):1–23

    Article  Google Scholar 

  • Boutros A et al (2018) Embracing diversity: enhanced DSP blocks for low-precision deep learning on FPGAs. In: IEEE International Conference on Field Programmable Logic and Applications (FPL), pp 35–357

    Google Scholar 

  • Boutros A et al (2020) Beyond peak performance: comparing the real performance of AI-optimized FPGAs and GPUs. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 10–19

    Google Scholar 

  • Boutros A et al (2022) Architecture and application co-design for beyond-FPGA reconfigurable acceleration devices. IEEE Access 10:95067–95082

    Article  Google Scholar 

  • Caulfield AM et al (2016) A cloud-scale acceleration architecture. In: IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–13

    Google Scholar 

  • Chaware R et al (2012) Assembly and reliability challenges in 3D integration of 28 nm FPGA die on a large high density 65 nm passive interposer. In: IEEE Electronic Components and Technology Conference, pp 279–283

    Google Scholar 

  • Cheah HY et al (2014) The iDEA DSP block-based soft processor for FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 7(3):1–23

    Article  Google Scholar 

  • Chiasson C, Betz V (2013a) COFFE: fully-automated transistor sizing for FPGAs. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 34–41

    Google Scholar 

  • Chiasson C, Betz V (2013b) Should FPGAs abandon the pass gate? In: International Conference on Field-Programmable Logic and Applications, pp 1–8

    Google Scholar 

  • Chromczak J et al (2020) Architectural enhancements in intel agilex FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 140–149

    Google Scholar 

  • Ebeling C et al (2016) Stratix 10 high performance routable clock networks In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 64–73

    Google Scholar 

  • Eldafrawy M et al (2020) FPGA logic block architectures for efficient deep learning inference. ACM Trans Reconfig Technol Syst (TRETS) 13(3):1–34

    Article  Google Scholar 

  • Estrin G (1960) Organization of computer systems: the fixed plus variable structure computer. In: Western Joint IRE-AIEE-ACM Computer Conference, pp 33–40

    Google Scholar 

  • Feng W et al (2018) Improving FPGA performance with a S44 LUT structure. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 61–66

    Google Scholar 

  • Fowers J et al (2018) A configurable cloud-scale DNN processor for real-time AI. In: ACM/IEEE International Symposium on Computer Architecture (ISCA), pp 1–14

    Google Scholar 

  • Gaide B et al (2019) **linx adaptive compute acceleration platform: versal architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 84–93

    Google Scholar 

  • Ganusov I, Devlin B (2016) Time-borrowing platform in the **linx ultrascale+ family of FPGAs and MPSoCs. In: IEEE International Conference on Field Programmable Logic and Applications (FPL), pp 1–9

    Google Scholar 

  • Halfhill TR (2010) Tabula’s time machine. Microprocess Rep 131:0–0

    Google Scholar 

  • Hall M, Betz V (2020) From tensorflow graphs to luts and wires: automated sparse and physically aware CNN hardware generation. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 56–65

    Google Scholar 

  • Hutton M et al (2005) Efficient static timing analysis and applications using edge masks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 174–183

    Google Scholar 

  • Kapre N, Gray J (2017) Hoplite: a deflection-routed directional torus NoC for FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 10(2):1–24

    Article  Google Scholar 

  • Karandikar S et al (2018) FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud. In: International Symposium on Computer Architecture (ISCA). . IEEE, pp 29–42

    Google Scholar 

  • Krupnova H, Saucier G (2000) FPGA-based emulation: industrial and custom prototy** solutions. In: International Workshop on Field-Programmable Logic and Applications (FPL). . Springer, pp 68–77

    Google Scholar 

  • Kuon I, Rose J (2007) Measuring the gap between FPGAs and ASICs. IEEE Trans Comput-Aided Des Integr Circuit Syst 26(2):203–215

    Article  Google Scholar 

  • LaForest CE et al (2012) Multi-ported memories for FPGAs via XOR. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 209–218

    Google Scholar 

  • Lai B-CC, Lin J-L (2016) Efficient designs of multiported memory on FPGA. IEEE Trans Very Large Scale Integr (VLSI) Syst 25(1):139–150

    Article  Google Scholar 

  • Langhammer M, Pasca B (2015) Floating-point DSP block architecture for FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 117–125

    Google Scholar 

  • Langhammer M et al (2021) Stratix 10 NX architecture and applications. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 57–67

    Google Scholar 

  • Lemieux G et al (2000) Generating highly-routable sparse crossbars for PLDs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 155–164

    Google Scholar 

  • Lemieux G et al (2004) Directional and single-driver wires in FPGA interconnect. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 41–48

    Google Scholar 

  • Lewis D et al (2003) The Stratix routing and logic architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 12–20

    Google Scholar 

  • Lewis D et al (2005) The Stratix II logic and routing architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 14–20

    Google Scholar 

  • Lewis D et al (2009) Architectural enhancements in Stratix-III and Stratix-IV. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 33–42

    Google Scholar 

  • Lewis D et al (2013) Architectural enhancements in Stratix V. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 147–156

    Google Scholar 

  • Lewis D et al (2016) The Stratix 10 highly pipelined FPGA architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 159–168

    Google Scholar 

  • Lockwood JW et al (2012) A low-latency library in FPGA hardware for high-frequency trading. In: Annual Symposium on High-Performance Interconnects (HOTI), pp 9–16

    Google Scholar 

  • Meher PK et al (2008) FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic. IEEE Trans Signal Process 56(7):3009–3017

    Article  MathSciNet  MATH  Google Scholar 

  • Murray K et al (2013) Titan: enabling large and complex benchmarks in academic CAD. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 1–8

    Google Scholar 

  • Murray K et al (2020a) VTR 8: high-performance cad and customizable FPGA architecture modelling. ACM Trans Reconfig Technol Syst (TRETS) 13(2):1–55

    Article  Google Scholar 

  • Murray K et al (2020b) Optimizing FPGA logic block architectures for arithmetic. IEEE Trans Very Large Scale Integr (VLSI) Syst 28(6):1378–1391

    Article  Google Scholar 

  • Nasiri E et al (2015) Multiple dice working as one: CAD flows and routing architectures for silicon interposer FPGAs. IEEE Trans Very Large Scale Integr (VLSI) Syst 24(5):1821–1834

    Article  Google Scholar 

  • Nikolić S et al (2020) Straight to the point: intra- and intercluster LUT connections to mitigate the delay of programmable routing. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 150–160

    Google Scholar 

  • Nurvitadhi E et al (2018) In-package domain-specific ASICs for intel Stratix 10 FPGAs: a case study of accelerating deep learning using TensorTile ASIC. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 106–1064

    Google Scholar 

  • Nurvitadhi E et al (2019) Why compete when you can work together: FPGA-ASIC integration for persistent RNNs. In: IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp 199–207

    Google Scholar 

  • Papamichael MK, Hoe JC (2012) CONNECT: re-examining conventional wisdom for designing NoCs in the context of FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 37–46

    Google Scholar 

  • Parandeh-Afshar H et al (2012) Rethinking FPGAs: elude the flexibility excess of LUTs with and-inverter cones. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 119–128

    Google Scholar 

  • Petelin O, Betz V (2016) The speed of diversity: exploring complex FPGA routing toplogies for the global metal layer. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 1–10

    Google Scholar 

  • Petersen MB et al (2021) NetCracker: a peek into the routing architecture of **linx 7-series FPGAs. In: International Symposium on Field-Programmable Gate Arrays (FPGA)

    Google Scholar 

  • Putnam A et al (2014) A reconfigurable fabric for accelerating large-scale datacenter services. In: ACM/IEEE International Symposium on Computer Architecture (ISCA), pp 13–24

    Google Scholar 

  • Qian T et al (2018) A 1.25 Gbps programmable FPGA I/O buffer with multi-standard support. In: IEEE International Conference on Integrated Circuits and Microsystems, pp 362–365

    Google Scholar 

  • Rasoulinezhad S et al (2019) PIR-DSP: an FPGA DSP block architecture for multi-precision deep neural networks. In: IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp 35–44

    Google Scholar 

  • Rasoulinezhad S et al (2020) LUXOR: an FPGA logic cell architecture for efficient compressor tree implementations. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 161–171

    Google Scholar 

  • Rettkowski J et al (2017) HW/SW co-design of the HOG algorithm on a xilinx zynq SoC. J Parallel Distrib Comput 109:50–62

    Article  Google Scholar 

  • Ronak B, Fahmy SA (2015a) Map** for maximum performance on FPGA DSP blocks. IEEE Trans Comput-Aided Design Integr Circuits Syst 35(4):573–585

    Article  Google Scholar 

  • Ronak B, Fahmy SA (2015b) Minimizing DSP block usage through multi-pum**. In: International Conference on Field Programmable Technology (FPT)

    Google Scholar 

  • Sivaswamy S et al (2005) HARP: hard-wired routing pattern FPGAs. In: International Symposium on Field-Programmable Gate Arrays (FPGA)

    Google Scholar 

  • Swarbrick I et al (2019) Network-on-chip programmable platform in versal ACAP architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 212–221

    Google Scholar 

  • Tang X et al (2019) A study on switch block patterns for tileable FPGA routing architectures. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 247–250

    Google Scholar 

  • Tatsumura K et al (2016) High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 4–11

    Google Scholar 

  • Tessier R et al (2007) Power-efficient RAM map** algorithms for FPGA embedded memory blocks. IEEE Trans Comput-Aided Des Integr Circuits Syst 26(2):278–290

    Article  Google Scholar 

  • Turakhia Y et al (2018) Darwin: a genomics co-processor provides up to 15,000x acceleration on long read assembly. ACM SIGPLAN Not 53(2):199–213

    Article  Google Scholar 

  • Tyhach J et al (2004) A 90 nm FPGA I/O buffer design with 1.6 Gbps data rate for source-synchronous system and 300 MHz clock rate for external memory interface. In: IEEE Custom Integrated Circuits Conference, pp 431–434

    Google Scholar 

  • Upadhyaya P et al (2016) A fully-adaptive wideband 0.5–32.75 Gb/s FPGA transceiver in 16 nm FinFET CMOS technology. In: IEEE Symposium on VLSI Circuits, pp 1–2

    Google Scholar 

  • Wang E et al (2019) Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput Surv (CSUR) 52(2):1–39

    Article  Google Scholar 

  • Wilton S et al (1995) Architecture of centralized field-configurable memory. In: ACM International Symposium on Field-Programmable Gate Arrays (FPGA), pp 97–103

    Google Scholar 

  • Wong H et al (2011) Comparing FPGA vs. custom cmos and the impact on processor microarchitecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 5–14

    Google Scholar 

  • Yazdanshenas S, Betz V (2018) Interconnect solutions for virtualized field-programmable gate arrays. IEEE Access 6:10497–10507

    Article  Google Scholar 

  • Yazdanshenas S, Betz v (2019) COFFE 2: automatic modelling and optimization of complex and heterogeneous FPGA Architectures. ACM Trans Reconfig Technol Syst (TRETS), 12(1):1–27

    Google Scholar 

  • Yazdanshenas S et al (2017) Don’t forget the memory: automatic block RAM modelling, optimization, and architecture exploration. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 115–124

    Google Scholar 

  • Yiannacouras P et al (2009) Data parallel FPGA workloads: software versus hardware. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 51–58

    Google Scholar 

  • Young-Schultz T et al (2020) Using openCL to enable software-like development of an FPGA-accelerated biophotonic cancer treatment simulator. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 86–96

    Google Scholar 

  • Zgheib G et al (2014) Revisiting and-inverter cones. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 45–54

    Google Scholar 

  • Zhao Z et al (2020) Achieving 100 Gbps intrusion prevention on a single server. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp 1083–1100

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaughn Betz .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Singapore Pte Ltd.

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Boutros, A., Betz, V. (2023). Field-Programmable Gate Array Architecture. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_49-1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6401-7_49-1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6401-7

  • Online ISBN: 978-981-15-6401-7

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Navigation