Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems

Tănase, Cristian Andy

doi:10.1007/s11227-020-03242-w

Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems

Published: 13 March 2020

Volume 76, pages 10122–10139, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Cristian Andy Tănase¹

316 Accesses
Explore all metrics

Abstract

This article describes the implementation of a dynamic scheduler for loading distribution between a hardware accelerator RTL and a CPU software task. The basic composition of a **linx-Zynq SoC device is a processing system (PS), coupled with FPGA programmable logic (PL). The two sections are connected via a number of Advanced eXtensible Interfaces. Hardware accelerators are mechanisms whereby different software algorithms are implemented register transfer logic (RTL) in the PL module. These accelerators determine an increased processing speed. In this article, we present a dynamic scheduler used for distribution of the load between the host processor and the RTL accelerator. There are situations in which even with increased processing speed of the accelerator, it cannot cope with the flow of data coming from memory system (shared memory). Therefore, it is necessary for this accelerator to be “aided” by a software module running in a CPU in the PS section. The article describes a scheduler that checks whether a hardware module for data processing meets the requirement of Hard Real Time (data are processed within a well-defined time frame), and in case it does not, it activates a software thread running on a CPU to support the hardware thread (out of the whole amount of data to be processed by the RTL thread, some of it is processed by the SW thread. Thus, the RTL thread will have less data to process). The scheduler activates the SW thread only when the system has to respond in real time and the amount of data cannot be processed within a certain time. Thus, the scheduler detects the need to activate the software thread that “helps” the hardware thread to process the data. The scheduler self-adjusts so that it executes a number of instructions in the software thread at all times, without introducing delays in running the RTL thread which is much faster. For this project PYNQ Z2 board, Vivado 2018.3 and Jupyter Notebook tools have been used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

Simultaneous multiprocessing in a software-defined heterogeneous FPGA

Article Open access 16 April 2018

High-Bandwidth Low-Latency Interfacing with FPGA Accelerators Using PCI Express

Power Mitigation by Performance Equalization in a Heterogeneous Reconfigurable Multicore Architecture

Article 05 May 2016

References

Nunez-Yanez J, Amiri S, Hosseinabady M, Rodríguez A, Asenjo R, Navarro A, Suarez D, Gran R (2018) Simultaneous multiprocessing in a software-defined heterogeneous FPGA. J Supercomput 71:1–18
Google Scholar
Arcas-Abella O et al (2014) An empirical evaluation of high-level synthesis languages and tools for database acceleration. In: FPL’14, pp 1–8
Xeon+FPGA platform for the data center. www.ece.cmu.edu/~calcm/carl/lib/exe/fetch.php?media=carl15-gupta.pdf. Accessed 05 Mar 2018
Putnam A et al (2014) A reconfigurable fabric for accelerating large-scale datacenter services. In: International Symposium on Computer Architecture, ISCA’14, pp 13–24
Enabling coherent FPGA acceleration. www.openpowerfoundation.org/wp-content/uploads/2015/03/Cantle_OPFS2015_Nallatech_031315_final.pdf. Accessed 05 Mar 2018
Auerbach J et al (2012) A compiler and runtime for heterogeneous computing. In: DAC’12, pp 271–276
Meng P, Jacobsen M, Kastner R (2012) FPGA-GPU-CPU heterogeneous architecture for real-time cardiac physiological optical map**. In: FPT’12, pp 37–42
Prongnuch S, Wiangtong T (2014) Heterogeneous computing platform for data processing. In: ISPACS’16, pp 1–4
Thoman P, Dichev K, Heller T, Iakymchuk R, Aguilar X, Hasanov K, Gschwandtner P, Lemarinier P, Markidis S, Jordan H, Fahringer T, Katrinis K, Laure E, Nikolopoulos DS (2018) A taxonomy of task-based parallel programming technologies for high-performance computing. J Supercomput 74:1422–1434
Article Google Scholar
Danne K, Platzner M (2005) Periodic real-time scheduling for FPGA computers. In: 3rd International Workshop on Intelligent Solutions in Embedded Systems (WISES’05), pp 117–127
Steiger C, Walder H, Platzner M (2003) Heuristics for online scheduling real-time tasks to partially reconfigurable devices. In: Proceedings of the International Conference on Field Programmable Logicand Applications, pp 575–584
Zhou X, Wang Y, Huang X, Peng C (2007) Fast on-line task placement and scheduling on reconfigurable devices. In: Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07), pp 132–138
Lu Y, Marconi T, Bertels K, Gaydadjiev G (2009) Online task scheduling for the FPGA-based partially reconfigurable systems. In: Proceedings of the International Workshop on Reconfigurable Computing: Architectures, Tools and Applications, pp 216–230
Marconi T, Lu Y, Bertels K, Gaydadjiev G (2010) 3D compaction: a novel blocking-aware algorithm for online hardware task scheduling and placement on 2D partially reconfigurable devices. In: Proceedings of the International Symposium on Applied Reconfigurable Computing, pp 194–206
Lu Y, Marconi T, Bertels K, Gaydadjiev G (2010) A communication aware online task scheduling algorithm for FPGA-based partially reconfigurable systems. In: Proceedings of the 18th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM’10), pp 65–68
Ghringer D, Hubner M, Nguepi Zeutebouo E, Becker J (2011) Operating system for runtime reconfigurable multiprocessor systems. Int J Reconfig Comput 2011:1–16
Google Scholar
Redaelli F, Santambrogio MD, Memik SO (2008) An ILP formulation for the task graph scheduling problem tailored to bi-dimensional reconfigurable architectures. In: Proceedings of the International Conference on Reconfigurable Computing and fPGAs (ReConFig’08), pp 97–102
Iturbe X, Benkrid K, Hong C, Ebrahim A, Arslan T, Martinez I (2013) Runtime scheduling, allocation, and execution of real-time hardware tasks onto **linx FPGAs subject to fault occurrence. Int J Reconfig Comput 2013:1–32
Article Google Scholar
Digilent, PYNQ-Z1 Board Reference Manual, Revised April 13, 2017
Crockett LH, Northcote D, Ramsay C, Robinson FD, Stewart RW (2019) Exploring Zynq\({\textregistered }\) MPSoC With PYNQ and machine learning applications, strathclyde academic media, Department of Electronic and Electronic and Electrical Engineering University of Strathclyde Glasgow, Scotland, UK

Download references

Author information

Authors and Affiliations

“Stefan cel Mare” University of Suceava, 13 Universitătii, 720229, Suceava, Romania
Cristian Andy Tănase

Authors

Cristian Andy Tănase
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristian Andy Tănase.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tănase, C.A. Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems. J Supercomput 76, 10122–10139 (2020). https://doi.org/10.1007/s11227-020-03242-w

Download citation

Published: 13 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11227-020-03242-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Simultaneous multiprocessing in a software-defined heterogeneous FPGA

High-Bandwidth Low-Latency Interfacing with FPGA Accelerators Using PCI Express

Power Mitigation by Performance Equalization in a Heterogeneous Reconfigurable Multicore Architecture

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Simultaneous multiprocessing in a software-defined heterogeneous FPGA

High-Bandwidth Low-Latency Interfacing with FPGA Accelerators Using PCI Express

Power Mitigation by Performance Equalization in a Heterogeneous Reconfigurable Multicore Architecture

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation