Design of High-performance Heterogeneous Integrated Circuits

Melikyan, Vazgen

doi:10.1007/978-3-031-50714-4_5

Vazgen Melikyan²

218 Accesses

Abstract

This chapter is devoted to the development of design means for high-performance heterogeneous ICs, which will eliminate the shortcomings in the operation of circuits and provide increased performance and make these circuits universal.

The principles of develo** design means for high-performance heterogeneous integrated circuits were proposed, which significantly improve their main technical parameters, performance, and data transmission mechanisms between components and reduce design time.

A method improved the means of data transmission between components in high-performance heterogeneous integrated circuits, which, due to modified architecture, provides reduction in the number of data bits eight times, by increasing the used area in the core by 2.25%.

A method has been developed to improve the means of data transmission between clock domains in high-performance heterogeneous integrated circuits, which, due to mixed-signal architecture, provides delay decrease at least 50% due to an increase in the occupied area by an average of 21%.

A method was proposed for implementing the architecture of heterogeneous integrated circuits, which, due to a scheduler, memory management unit, direct memory access, and a special command set, provides a 32.48% increase in speed due to an increase of area by 11%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 93.08; Price includes VAT (Germany)

Hardcover Book: EUR 117.69; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Y. Li, X. Zhao, T. Cheng, Heterogeneous computing platform based on CPU+FPGA and working modes. 2016 12th International conference on computational intelligence and security (CIS) (2016), pp. 669–672
Google Scholar
K. Rupp, Microprocessor trend data (2022). https://github.com/karlrupp/microprocessor-trend-data/tree/master/50yrs
M. Gianfagna, What is Moore’s law? (2021), https://www.synopsys.com/glossary/what-is-moores-law.html#:~:text=Definition,as E %3D mc2)
M.H. Scaling, Power, and the future of CMOS technology. Device research conference (2008), pp. 7–8
Google Scholar
F. Juan, F. Qingwen, H. **aoting, et al., Performance optimization by dynamically altering cache replacement algorithm in CPU-GPU heterogeneous multi-core architecture. 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID) (2017), pp. 723–726
Google Scholar
S. Vijayalakshmi, A. Alagan, D.P. Kothari, Power-performance of multi-threaded multi-core processor: analysis, optimization and simulation. 2013 international conference on high performance computing & simulation (HPCS) (2013), pp. 674–677
Google Scholar
M. Diogo, D. Helder, S. Leonel, I. Aleksandar, Analyzing performance of multi-cores and applications with cache-aware Roofline Model. 2017 international conference on high performance computing & simulation (HPCS) (2017), pp. 933–934
Google Scholar
R. Ritesh, K. Neeharika, R. Nitin, Digital image processing through parallel computing in single-core and multi-core systems using MATLAB. 2017 2nd IEEE international conference on recent trends in electronics, information & communication technology (RTEICT) (2017), pp. 462–465
Google Scholar
L. Duk Hyung, C. Hyun Hak, J. Ok Hyun. Analysis of power, temperature, and performance on mobile application processor. International conference on mechatronics, robotics and systems engineering (MoRSE) (2019), pp. 81–85
Google Scholar
W. Siqi, A. Gayathri, M. Tulika, OPTiC: Optimizing collaborative CPU–GPU computing on mobile devices with thermal constraints. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 38(3), 393–406 (2019)
Article Google Scholar
Jayant, V. Shahi, C.M. Velpula, CPU temperature aware scheduler a study on incorporating temperature data for CPU scheduling decisions. 2015 international conference on advances in computing, communications and informatics (ICACCI) (2015), pp. 2409–2413
Google Scholar
2021 Trends. https://static1.squarespace.com/static/6130ef779c7a2574bd4b8888/t/616c79ed5a30e36825f47818/1634499069232/isscc2021.press_kit_110620.pdf. Institute of Electrical and Electronics Engineers – University of Pennsylvania (2021), pp. 1–152
B. Shekhar, C.A. Andrew, The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)
Article Google Scholar
B. Shekhar, Thousand Core Chips—A Technology Perspective (Intel Corp, Microprocessor Technology Lab, Hillsboro, 2012), pp. 746–749
Google Scholar
White Paper, Next leap in microprocessor architecture: Intel® Core™ duo processor (2006), p. 4
Google Scholar
A.R.A. Saif, K. Bin Jumari, Performance study of Core2Duo desktop processors. 2009 International conference on electrical engineering and informatics (2009), pp. 532–536
Google Scholar
M.D. Hill, Amdahl’s law in the multicore era. 2008 IEEE 14th international symposium on high performance computer architecture (2008) vol. 41, no. 7, pp. 33–38
Google Scholar
B. Rubén, B. Daniele, B. Andrea, A. Giovanni, et al., A synchronization-based hybrid-memory multi-core architecture for energy-efficient biomedical signal processing. IEEE Trans. Comput. 66(4), 575–585 (2017)
Article MathSciNet Google Scholar
K. Takanori, L. Yamin, A cost and performance analytical model for large-scale on-chip interconnection networks. 2016 4th international symposium on computing and networking (CANDAR) (2016), pp. 447–450
Google Scholar
M.J. Cade, A. Qasem, Balancing locality and parallelism on shared-cache mulit-core systems. 2009 11th IEEE international conference on high performance computing and communications (HPCC 2009) (2009), pp. 188–195. https://doi.org/10.1109/HPCC.2009.61
J. Ma, C. Hao, W. Zhang, T. Yoshimura, Power-efficient partitioning and cluster generation design for application-specific network-on-chip. 2016 international SoC design conference: smart SoC for intelligent things (ISOCC) (2016), pp. 83–84. https://doi.org/10.1109/ISOCC.2016.7799744
K. Onur, N. Nachiappan Chidambaram, J. Adwait, A. Rachata, Managing GPU concurrency in heterogeneous architectures. 2014 47th annual IEEE/ACM international symposium on microarchitecture (2014), pp. 114–126
Google Scholar
J. Choquette, W. Gandhi, O. Giroux, et al., NVIDIA A100 tensor Core GPU: Performance and innovation. IEEE Micro. 41(2), 29–35 (2021). https://doi.org/10.1109/MM.2021.3061394
Article Google Scholar
F.L. Yuan, C.C. Wang, T.H. Yu, D. Marković, A multi-granularity FPGA with hierarchical interconnects for efficient and flexible Mobile computing. IEEE J. Solid State Circuits 50(1), 137–149 (2015). https://doi.org/10.1109/JSSC.2014.2372034
Article Google Scholar
Z. Lai, K.T. Lam, C.L. Wang, J. Su, A power modelling approach for many-core architectures. Proceedings of the 2014 10th international conference on semantics, knowledge and grids (SKG-2014) (2014), pp. 128–132. https://doi.org/10.1109/SKG.2014.10
F. Conti, C. Pilkington, A. Marongiu, L. Benini, He-P2012: Architectural heterogeneity exploration on a scalable many-core platform. Proceedings of the ACM great lakes symposium on VLSI, (GLSVLSI) (2014), pp. 231–232. https://doi.org/10.1145/2591513.2591553
W.P. Huang, R.C.C. Cheung, H. Yan, An efficient application specific instruction set processor (ASIP) for tensor computation. Proceedings of the international conference on application-specific systems, architectures and processors, vol. 2019 (2019), p. 37. https://doi.org/10.1109/ASAP.2019.00-36
H. Anwar, M. Daneshtalab, M. Ebrahimi, M. Ramirez, et al Integration of AES on heterogeneous many-core system. Proceedings of the 2014 22nd euromicro international conference on parallel, distributed, and network-based processing, (PDP 2014) (2014), pp. 424–427. https://doi.org/10.1109/PDP.2014.86
H.-J. Wunderlich, Simulation on reconfigurable heterogeneous computer architectures (2017), https://www.iti.uni-stuttgart.de/en/chairs/ca/projects/oldprojects/simtech/
A.Z. Adamov, Computation model of data intensive computing with MapReduce. Proceedings of the 14th IEEE international conference on application of information and communication technologies (AICT-2020) (2020), pp. 1–5. https://doi.org/10.1109/AICT50176.2020.9368841
M. Davari, A. Ros, E. Hagersten, S. Kaxiras, An efficient, self-contained, on-chip directory: DIR1-SISD. Parallel architectures and compilation techniques – Conference proceedings (PACT) (2015), pp. 317–330. https://doi.org/10.1109/PACT.2015.23
I. Yamazaki, J. Kurzak, P. Luszczek, J. Dongarra, Design and implementation of a large scale tree-based QR decomposition using a 3D virtual systolic array and a lightweight runtime. Proceedings of the IEEE 28th international parallel and distributed processing symposium workshops (IPDPSW-2014) (2014), pp. 1495–1504. https://doi.org/10.1109/IPDPSW.2014.167
M.T. Sim, Q. Yi, An adaptive multitasking superscalar processor. 2019 IEEE 5th International conference on computer and communications (ICCC 2019) (2019), pp. 1293–1299. https://doi.org/10.1109/ICCC47050.2019.9064185
S. Processors, Superscalar processor: Intro (1995). No. 7, pp. 1–19. https://en.wikipedia.org/wiki/Superscalar_processor
SISD, SIMD, MISD, MIMD. https://learnlearn.uk/alevelcs/sisd-simd-misd-mimd/
J. Chen, C. Yang, Optimizing SIMD parallel computation with non-consecutive array access in inline SSE assembly language. Proceedings of the 2012 5th international conference on intelligent computation technology and automation (ICICTA-2012) (2012), pp. 254–257. https://doi.org/10.1109/ICICTA.2012.70
B.S. Mahmood, M.A.A. Jbaar, Design and implementation of SIMD vector processor on FPGA. 2011 4th international symposium on innovation in information and communication technology (ISIICT’2011) (2011), pp. 124–130. https://doi.org/10.1109/ISIICT.2011.6149607
L. Juan Gómez, M. Onur, P&S heterogeneous systems SIMD processing and GPUs (2021), pp. 1–75. https://safari.ethz.ch/projects_and_seminars/fall2021/lib/exe/fetch.php?media=p_s-hetsys-fs2021-meeting2-aftermeeting.pdf
B. Rajeshwari, K. Veena, MIMO receiver and decoder using vector processor. Proceedings/TENCON IEEE region 10 annual international conference: 2017, vol. 2017-December, pp. 1225–1230. https://doi.org/10.1109/TENCON.2017.8228044
K. Patsidis, C. Nicopoulos, G.C. Sirakoulis, G. Dimitrakopoulos, RISC-V2: A scalable RISC-V vector processor. Proceedings of the IEEE international symposium on circuits and systems, October (2020), pp. 1–5. https://doi.org/10.1109/iscas45731.2020.9181071
Y. ** method. 2009 1st international conference on information science and engineering (ICISE-2009) (2009), pp. 95–98. https://doi.org/10.1109/ICISE.2009.203
A. Halaas, B. Svingen, M. Nedland, P. Sætrom, et al., A recursive MISD architecture for pattern matching. IEEE Trans. Very Large Scale Integr. Syst. 12(7), 727–734 (2004). https://doi.org/10.1109/TVLSI.2004.830918
Article Google Scholar
A. Yazdanbakhsh, K. Samadi, N.S. Kim, H. Esmaeilzadeh, GANAX: A unified MIMD-SIMD acceleration for generative adversarial networks. Proceedings of the international symposium on computer architecture (2018), pp. 650–661. https://doi.org/10.1109/ISCA.2018.00060
S. Arrabi, D. Moore, L. Wang, K. Skadron, et al., Flexibility and circuit overheads in reconfigurable sIMD/MIMD systems. Proceedings of the 2014 IEEE 22nd international symposium on field-programmable custom computing machines (FCCM 2014) (2014), p. 236. https://doi.org/10.1109/FCCM.2014.71
Y. Yamato, N. Hoshikawa, H. Noguchi, et al., A study to optimize heterogeneous resources for open IoT. Proceedings of the 2017 5th international symposium on computing and networking (CANDAR-2017), January (2018), pp. 609–611. https://doi.org/10.1109/CANDAR.2017.16
K. Gai, L. Qiu, H. Zhao, M. Qiu, Cost-aware multimedia data allocation for heterogeneous memory using genetic algorithm in cloud computing. IEEE Trans. Cloud Comput. 8(4), 1212–1222 (2020). https://doi.org/10.1109/TCC.2016.2594172
Article Google Scholar
A.R. Brodtkorb, C. Dyken, T.R. Hagen, et al., State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010). https://doi.org/10.3233/SPR-2009-0296
Article Google Scholar
K. Zhu, Y. Ding, Research on low power scheduling of heterogeneous multi core mission based on genetic algorithm. Proceedings of the 9th international conference on measuring technology and mechatronics automation (ICMTMA-2017) (2017), pp. 219–223. https://doi.org/10.1109/ICMTMA.2017.0059
C. Yu, M. Cai, An image depth processing method based on parallel computing and multi-GPU. Proceedings of the 2nd international conference on smart electronics and communication (ICOSEC-2021) (2021), pp. 1009–1012. https://doi.org/10.1109/ICOSEC51865.2021.9591686
A.K. Gupta, A. Raman, N. Kumar, R. Ranjan, Design and implementation of high-speed universal asynchronous receiver and transmitter (UART). 2020 7th international conference on signal processing and integrated networks (SPIN-2020) (2020), pp. 295–300. https://doi.org/10.1109/SPIN48934.2020.9070856
S. Harutyunyan, T. Kaplanyan, A. Kirakosyan, H. Khachatryan, Configurable verification IP for UART. 2020 IEEE 40th international conference on electronics and nanotechnology (ELNANO) (2020), pp. 234–237
Google Scholar
T. Praveen Blessington, B. Bhanu Murthy, G.V. Ganesh, T.S.R. Prasad, Optimal implementation of UART-SPI interface in SoC. 2012 international conference on devices, circuits and systems, ICDCS 2012 (2012), pp. 673–677. https://doi.org/10.1109/ICDCSyst.2012.6188657
V. Melikyan, S. Harutyunyan, A. Kirakosyan, T. Kaplanyan, UVM verification IP for AXI. 2021 IEEE east-west design and test symposium, (EWDTS-2021) (2021), pp. 1–4. https://doi.org/10.1109/EWDTS52692.2021.9580997
J. Liu, M. Hong, K. Do, J.Y. Choi, et al. Clock domain crossing aware sequential clock gating. Design, automation & test in Europe conference & exhibition (DATE) (2015), pp. 1–6
Google Scholar
S. Hatture, S. Dhage, Open loop and closed loop solution for clock domain crossing faults. Global conference on communication technologies (GCCT-2015) (2015), pp. 645–649. https://doi.org/10.1109/GCCT.2015.7342741
D. Basu, D.K. Kole, H. Rahaman, Implementation of AES algorithm in UART module for secured data transfer. Proceedings of 2012 international conference on advances in computing and communications (ICACC-2012) (2012), pp. 142–145. https://doi.org/10.1109/ICACC.2012.32
B. Zhang, K. Zhang, J. Zhu, X. Li, UART interface design based on DM642 video surveillance system and wireless network module. Proceedings of 2011 IEEE 2nd international conference on software engineering and service science (ICSESS-2011) (2011), pp. 477–480. https://doi.org/10.1109/ICSESS.2011.5982357
KeyStone architecture: Universal asynchronous receiver/transmitter (UART). Texas Instruments (2010), pp. 1–51
Google Scholar
J.H. Hong, S.W. Han, E.Y. Chung, A RAM cache approach using host memory buffer of the NVMe interface. International SoC design conference: Smart SoC for intelligent things (ISOCC-2016). (2016), pp. 109–110. https://doi.org/10.1109/ISOCC.2016.7799757
D. Akash, M. Kishore, Mohana, K.H. Basha, Interfacing of flash memory and DDR3 RAM memory with Kintex 7 FPGA board. Proceedings of the 2nd IEEE international conference on recent trends in electronics, information and communication technology (RTEICT-2017) proceedings, January (2017), pp. 2006–2010. https://doi.org/10.1109/RTEICT.2017.8256950
S. Zhou, T. Zhang, Y. Yang, cross clock domain signal research based on dynamic motivation model. Proceedings of the 4th international conference on dependable systems and their applications. (DSA-2017), January (2017), p. 156. https://doi.org/10.1109/DSA.2017.34
N. Karimi, K. Chakrabarty, Detection, diagnosis, and recovery from clock-domain crossing failures in multiclock SoCs. IEEE Trans. Comput.-Aided Design Integra. Circuits Syst. 32(9), 1395–1408 (2013). https://doi.org/10.1109/TCAD.2013.2255127
Article Google Scholar
V. Melikyan, S. Harutyunyan, T. Kaplanyan, A. Kirakosyan, et al., Design and verification of novel sync cell. Proceedings of the 2021 IEEE east-west design and test symposium, (EWDTS-2021) (2021). pp. 1–5. https://doi.org/10.1109/EWDTS52692.2021.9580985
C.E. Cummings, Clock domain crossing (CDC) design & verification techniques using system Verilog. Techniques (2008), No. Cdc. pp. 1–56
Google Scholar
M. Bartík, Clock domain crossing – An advanced course for future digital design engineers. Proceedings of the 2018 7th mediterranean conference on embedded computing (MECO-2018) – Including ECYPS-2018 (2018), pp. 1–5. https://doi.org/10.1109/MECO.2018.8406004
S. Beer, R. Ginosar, R. Dobkin, Y. Weizman, MTBF estimation in coherent clock domains. Proceedings of the international symposium on asynchronous circuits and systems (2013), pp. 166–173. https://doi.org/10.1109/ASYNC.2013.19
ASIP Designer (2021), https://www.synopsys.com/dw/doc.php/ds/cc/asip-brochure.pdf
T. Sato, S. Chivapreecha, P. Moungnoul, K. Higuchi, An FPGA architecture for ASIC-FPGA co-design to streamline. Process. IDSs. 412–417 (2017). https://doi.org/10.1109/cts.2016.0079
A.S. Hussein, H. Mostafa, ASIC-FPGA gap for a RISC-V core implementation for DNN applications. Proceedings of the 3rd novel intelligent and leading emerging sciences conference (NILES-2021) (2021), pp. 385–388. https://doi.org/10.1109/NILES53778.2021.9600503
The OpenCL specification. Khronos OpenCL working Group (2019). https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html
V. Mekkat, A. Holey, P.C. Yew, A. Zhai, Managing shared last-level cache in a heterogeneous multicore processor. Parallel architectures and compilation techniques – Conference proceedings (PACT) (2013), pp. 225–234. https://doi.org/10.1109/PACT.2013.6618819
S. Harutyunyan, T. Kaplanyan, A. Kirakosyan, A. Momjyan, Design and verification of autoconfigurable UART controller. Proceedings of the 2020 IEEE 40th international conference on electronics and nanotechnology (ELNANO-2020) (2020), pp. 347–350. https://doi.org/10.1109/ELNANO50318.2020.9088789
T.K. Kaplanyan, A novel pulse synchronizer design with the proposed sync cell model. Proc. RA NAS NPUA Ser. Tech. Sci. 74(4), 464–470 (2021)
Google Scholar
V.Sh. Melikyan, M. Martirosyan, A. Melikyan, G. Piliposyan. 14nm educational design kit: Capabilities, deployment and future. Proceedings of the 7th small systems simulation symposium 2018, Niš, Serbia, February 12–14 (2018), pp. 37–41
Google Scholar
T.K. Kaplanyan, L.A. Mikaelyan, A.A. Petrosyan, A.M. Momjyan, et al, Design of video processing platform with interchangeable input-output interfaces. 2019 IEEE 39th international conference on electronics and nanotechnology: Proceedings (ELNANO-2019) (2019), pp. 201–205. https://doi.org/10.1109/ELNANO.2019.8783420

Download references

Author information

Authors and Affiliations

Synopsys Armenia CJSC, Yerevan, Armenia
Vazgen Melikyan

Authors

Vazgen Melikyan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Melikyan, V. (2024). Design of High-performance Heterogeneous Integrated Circuits. In: Machine Learning-based Design and Optimization of High-Speed Circuits. Springer, Cham. https://doi.org/10.1007/978-3-031-50714-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-50714-4_5
Published: 31 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50713-7
Online ISBN: 978-3-031-50714-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics