Supporting AI-powered real-time cyber-physical systems on heterogeneous platforms via hypervisor technology

Cittadini, Edoardo; Marinoni, Mauro; Biondi, Alessandro; Cicero, Giorgiomaria; Buttazzo, Giorgio

doi:10.1007/s11241-023-09402-4

Supporting AI-powered real-time cyber-physical systems on heterogeneous platforms via hypervisor technology

Open access
Published: 17 July 2023

Volume 59, pages 609–635, (2023)
Cite this article

Download PDF

You have full access to this open access article

Real-Time Systems Aims and scope Submit manuscript

Supporting AI-powered real-time cyber-physical systems on heterogeneous platforms via hypervisor technology

Download PDF

Edoardo Cittadini ORCID: orcid.org/0000-0003-1714-8960¹,
Mauro Marinoni¹,
Alessandro Biondi¹,
Giorgiomaria Cicero¹ &
…
Giorgio Buttazzo¹

1846 Accesses
1 Citation
Explore all metrics

Abstract

The heavy use of machine learning algorithms in safety-critical systems poses serious questions related to safety, security, and predictability issues, requiring novel architectural approaches to guarantee such properties. This paper presents an architecture solution that leverages heterogeneous platforms and virtualization technologies to support AI-powered applications consisting of modules with mixed criticalities and safety requirements. The hypervisor exploits the security features of the **linx ZCU104 MPSoCs to create two isolated execution environments: a high performance domain running deep learning algorithms under the Linux operating system and a safety-critical domain running control and monitoring functions under the freeRTOS real-time operating system. The proposed approach is validated by a use case consisting of an unmanned aerial vehicle capable of tracking moving targets using a deep neural network accelerated on the FGPA available on the platform.

Time-sensitive autonomous architectures

Article 21 August 2023

How to Support the Machine Learning Take-Off: Challenges and Hints for Achieving Intelligent UAVs

Securing Cyber-Physical Systems: Physics-Enhanced Adversarial Learning for Autonomous Platoons

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Modern cyber-physical systems (CPS), e.g., cars, aircrafts, advanced robots, and drones, are characterized by an increasing complexity that calls for new technologies and architectural solutions to guarantee predictability, safety, and security requirements. In addition, the increased level of autonomy specified for such systems requires the adoption of artificial intelligence (AI) and, more specifically, machine learning algorithms, which in turn imply heavy use of hardware acceleration to satisfy the stringent real-time constraints imposed by the applications.

Unfortunately, however, today’s AI algorithms are not ready to be integrated in mission-critical CPS, since their results cannot be always trusted and well-accepted engineering methodologies to mitigate the problem are still missing. A promising solution consists in coupling AI models with a set of classical algorithms that can take over the control of the system whenever the outputs produced by AI are not deemed safe, with the aim of bringing the system into fail-safe or fail-operational conditions.

In such complex systems, at least two groups of software components can be distinguished, being characterized by different sets of requirements and criticality levels:

Software components that require support from a rich execution environment (e.g., based on the Linux operating system), like AI algorithms, acquisition and processing stacks for complex sensors (as cameras and LiDARs), and high-speed network communication services.
Software components that require a high-integrity execution environment (e.g., powered by a real-time operating system), like low-level control functions, safety-critical monitoring activities, and procedures to ensure fail-safe/fail-operational behavior.

The components belonging to the first group can be deemed not critical for safety and security, provided that they are properly isolated from the critical ones belonging to the second group. In this context, strong isolation is required to ensure that non-critical components cannot affect the execution of critical ones, including the guarantee that cyber-attacks and faults cannot propagate from the former to the latter.

Isolation could be achieved by executing these software components on different hardware platforms, e.g., reserving an independent platform to host the execution of critical software only. However, in several cases, as for battery-operated flying drones, such features have to be provided under stringent resource constraints, imposing additional limitations in terms of space, weight, power, and cost (SWaP-C). For this reason, a more appropriate solution is to host the execution of software components with mixed and independent safety and security levels on the same hardware platform. These systems are also referred to as mixed-criticality software systems and can leverage hypervisor technology to enforce isolation as well as enable the execution of multiple operating systems on the same hardware.

Mixed-criticality systems powered by hypervisor technology have been investigated since many years from different perspectives, especially in domains such as avionics (Gaska et al. 2011), aerospace (Crespo et al. 2009), and control (Crespo et al. 2018). Farrukh and West (2022) proposed a hypervisor-based architecture for combining a Linux domain with a real-time critical domain on the same platform for a drone application, but no machine learning and hardware acceleration were exploited. Scordino et al. (2020) presented a modular hypervisor-based platform for industrial automation for integrating both real-time control code and software design tools, but no AI algorithms and FPGA acceleration were employed.

Similarly, the challenges of achieving real-time performance in AI-powered cyber-physical systems have been discussed and reviewed by several authors (Musliner et al. 1995; Radanliev et al. 2020; Seng et al. 2021). For instance, Wang and Luo (2022) presented a review on the optimal design of neural networks on FPGA platforms. Ji et al. (2021) presented the implementation of a deep neural network for real-time object detection and tracking on an embedded system based on an FPGA Zynq platform. Sciangula et al. (2022) proposed an efficient method for accelerating deep neural networks for autonomous driving applications on an FPGA-based SoC. All these works, however, did not leverage hypervisor technology to integrate mixed criticality components.

A conceptual hypervisor-based architecture for supporting the execution of complex functionalities that are typical of AI-enabled CPS was proposed by Biondi et al. (2020); however, to the best of our records, a practical solution that integrates the acceleration of deep neural networks with real-time control on a single platform using hypervisor technology is still missing.

1.1 Contribution

This work presents a concrete software architecture for supporting AI-enabled CPS with mixed-criticality components. The proposed architecture targets heterogeneous computing platforms that couple asymmetric multicores with programmable logic (FPGA). It leverages hypervisor technology with strong isolation, hardware acceleration of AI algorithms implemented in programmable logic, and monitoring strategies to take over the control of the system whenever non-critical software components fail, are attacked, or produce results that are deemed unsafe. The architecture is then specialized for the case of autonomous flying drones, showing how it can be used to build a safe and secure tracking application.

1.2 Paper organization

The rest of the paper is organized as follows: Sect. 2 discusses some relevant related work; Sect. 3 presents the general architectural approach; Sect. 4 describes how the proposed architecture has been instantiated to a specific use case consisting of a visual tracking application performed by a drone; Sect. 5 reports some experimental results; and, finally, Sect. 6 concludes the paper and presents some future work.

2 State of the art

To the best of our records, there is no established approach in the literature for develo** AI-powered cyber-physical systems; rather, several architectural solutions have been proposed by researchers in different contexts.

A classical approach to handle functions with different criticality requirements consists in executing them on separate computing platforms, typically managed by different operating systems that communicate through an external link, such as a serial line, a CAN network, or an Ethernet bus. Examples of CPS that adopt this approach are the Intel Ready-to-Fly (RTF) Drone (Intel Corporation), the Cube Autopilot (CubePilot), and several solutions that leverage the Robot Operating System (ROS) (Gutiérrez et al. 2016) implemented a real-time publish-subscribe communication mechanism in the Xstratum (Crespo et al. 2009) hypervisor integrating ARINC-653 with the Data Distribution Service (DDS). Biondi et al. (2021) proposed a hypervisor-based architecture for safety-critical embedded systems providing both time/memory isolation, security, real-time communication channels, as well as I/O virtualization to allow different virtual machines to share the peripheral devices. Farrukh and West (2022) proposed a hypervisor-based solution characterized by low overheads in accessing resources. Their approach requires strict time guarantees for both domains, forcing the execution of Linux on a single core with SCHED_DEADLINE (Lelli et al. 2016), which is a viable solution in terms of real-time constraints, but can introduce several limitations in the implementation of complex AI-based solutions.

2.2 Hardware acceleration

A peculiar feature of AI-powered cyber-physical systems is their massive computational workload for executing AI algorithms such as deep neural networks. The most demanding functions of these algorithms need to be accelerated on specific hardware, such as general-purpose graphics processing units (GPUs) or field-programmable gate arrays (FPGA), to satisfy real-time requirements.

Modern GPU-based heterogeneous platforms benefit from powerful and mature software support to accelerate AI algorithms, which allows the user to significantly contain the effort for achieving efficient implementations of tasks like object detection, image segmentation, and tracking. The acceleration frameworks for GPU-based platforms also allow a developer to seamlessly use, with no or just a few modifications, the AI models available in frameworks such as Tensorflow, PyTorch, and Caffe, even with the native parameters with floating-point precision.

On the other hand, when compared to FPGA-based platforms, GPU-based platforms are very demanding in terms of power consumption and struggle in providing a high degree of time predictability for hardware acceleration. Their power consumption can be one order of magnitude larger than the one required by FPGA-based platforms (Sciangula et al. 2022; Qasaimeh et al. 2019). Furthermore, as observed by Cavicchioli et al. (2017), GPU acceleration introduces highly variable delays that cannot easily be bounded a priori, also due to the contention occurring on shared memory in the case of memory-intensive GPU tasks. As such, GPU-based are not the ideal solution for battery-operated CPS such as drones.

Besides being characterized by less energy consumption, FPGAs provide a very predictable execution behavior with respect to GPUs for hardware acceleration.

Two main approaches are used to accelerate deep neural networks by means of FPGA technology:

1.
The synthesis of a network-specific accelerator provides the best performance but suffers from poor flexibility and scalability, especially for large networks. To name one of the most relevant issues, this approach ends up in deploying replicated logic that implements the same operation (e.g., convolutions) on different data. Some tools provide IPs (e.g., HLS4ML Fahim et al. 2021; AMD **linx: FINN) as standalone Verilog/VHDL entities, which can be later integrated into more complex designs. Another relevant limitation of this approach is that the generated IP must be entirely rebuilt every time there is any change in the network.
2.
A more flexible solution is to accelerate neural networks by means of a dedicated softcore. For instance, **linx provides a Deep Learning Processor Unit (AMD **linx: DPU) as a library component in the Vitis-AI environment (AMD **linx: Vitis AI). Besides the evident benefits in terms of flexibility provided by a network-agnostic accelerator such as the DPU, an advantage of this approach is that a single DPU can concurrently accelerate multiple networks, while in the other approach, the number of networks that can be accelerated is mainly limited by the amount of FPGA resources (such as LUTs).

The main disadvantage of FPGAs is that they require a larger programming effort than GPUs, especially when develo** a network-specific accelerator. Another restriction is the limited amount of FPGA resources that is available in several embedded platforms, which calls for the usage of dynamic partial reconfiguration (Biondi et al. 2016; Seyoum et al. 2021) of the FPGA at the cost of additional delays when serving acceleration requests. Furthermore, as observed by some authors (Vaishnav et al. 2018; Happe et al. 2015; Rupnow et al. 2009), even without dynamic reconfiguration, sharing an FPGA among tasks managed by a preemptive scheduling policy is not trivial, due to the significant amount of time required to save the state of the device.

Fortunately, especially in the case of softcore accelerators such as the DPU, compilation and optimization frameworks are available to drastically simplify the deployment of accelerated neural networks. These frameworks employ pruning and quantization (Zhou et al. 2017) of the network parameters to achieve an efficient execution on the FPGA. The accuracy drop of these optimization processes was found not to be significant in several application scenarios (Gholami et al. 2022; Liang et al. 2021). The optimization algorithms dealing with the conversion from floating point to integer values are indeed now efficient enough to guarantee consistency in the transformation of the models from one platform to another (GPU to FPGA) (Ding et al.

ZCU104 board by **linx/AMD, equipped with an Ultrascale+ MPSoC (XCZU7EV);

**linx/AMD DPU accelerator (DPUCZDX8G) to be deployed onto the FPGA fabric of the Ultrascale+;

CLARE-Hypervisor by (Accelerat: The CLARE Software Stack);

Linux operating system for the non-critical, high-performance domain; and

FreeRTOS as the real-time operating system for the critical domain.

The ZCU104, although conceived as a development board, allows matching SWaP-C constraints for several target applications, at least in their prototype stage. At the same time, the amount of FPGA resources available in the MPSoC installed on the ZCU104 allows deploying peripherals that are missing in the board, with a significant speed-up and flexibility in the hardware setup. Finally, the Deep Learning Processor Unit (DPU) by ** mixed-criticality CPS applications. It has been designed to explicitly support modern heterogeneous platforms, such as GPGPU- and FPGA-based MPSoC, to safely and securely control their computational resources. CLARE-Hypervisor also provides multi-domain virtualization of the FPGA area, enabling strong isolation also for PL components such as hardware accelerators.

Linux has been selected as GPOS for its extensive support for peripherals drivers, communication stacks, and modern AI frameworks.

FreeRTOS has been selected as RTOS because its execution model is suitable for timing analysis and, because of its diffusion, it includes a rich set of drivers for low-level devices.

The resulting specialized architecture is illustrated in Fig. 1.

4 The case for autonomous drones

This section describes how the architecture presented in Sect. 3 can be used to implement an AI-powered visual tracking application on a quadcopter drone equipped with an inertial measurement unit (IMU), a camera for object tracking, and two directional LiDAR sensors for obstacle detection, one pointing forward and one backward.

The overall block diagram of the multi-domain application that controls the drone is illustrated in Fig. 2, which also distinguishes the functions executed in the Linux domain (blue blocks) from those running in the critical domain (orange blocks). In particular, the ARMv8 processing system is divided across domains, so that three out of four cores are assigned to the Linux domain, while the remaining one is assigned to the freeRTOS domain. The figure also highlights with a double border the modules that are either entirely implemented in FPGA or leverage the FPGA to accelerate some functions.

The main task of the Linux domain is the inference of a deep neural network (DNN) for real-time multiple object tracking using a strategy derived from DeepSORT (Wojke et al. 2017) and BYTEtrack (Zhang et al. 2021). The generated bounding boxes, paired with a unique ID of the object, are used to compute a setpoint for the low-level drone controller running in the critical domain. In this context, support for hardware acceleration is essential to achieve acceptable performance, because all state-of-the-art neural trackers generate a significant workload that has to be executed in real-time (normally at the camera frame rate). Table 1 summarizes the main functions that compose the system.

Table 1 Application functions

Full size table

4.1 Devices synthesized on the FPGA

The FPGA is used to synthesize a number of devices that are assigned to the virtual machines by the hypervisor. In particular, each device is exclusively assigned in a pass-through way to one domain only, while the hypervisor is responsible for providing strong isolation. In particular, the **linx DPU device is accessible by the Linux domain, while all the other devices are assigned to the critical domain. All custom devices synthesized on the programmable logic are described in the following list:

1.
DPU: The Deep Learning Processor Unit (DPU) is a softcore provided by **linx to efficiently accelerate the inference of deep neural networks.
2.
Radio decoder: It takes the pulse position modulated (PPM) signal from the radio receiver, decodes it, and puts the corresponding digital values in a set of registers. Without the help of specialized hardware, PPM signals would have to be managed in software using, for instance, GPIOs configured to raise interrupts at each edge in the signal to process it. This may easily lead to poor performance and excessive interference on the processors due to the service of interrupts and the consequent context switches. The use of a dedicated FPGA component to handle the PPM signal of the radio receiver hence relieves the processors from this burden and reduces the corresponding overhead and jitter.
3.
\(I^2C\) device: The ZCU104 board allows exposing an \(I^2C\) peripheral working with 1.8 V logic levels, while the adopted IMU works with 3.3 V logic levels. To avoid introducing third-party electronics to adapt the logic levels (e.g., using a voltage-level translator), an AXI-based 3.3 V \(I^2C\) master device to be deployed on FPGA was developed.
4.
UART device: A custom AXI-based UART peripheral to be deployed on FPGA was developed for the same reasons mentioned above, given that the adopted LiDAR works with 3.3 V logic levels.
5.
PWM driver: It is used to generate pulse width modulation (PWM) signals to drive the drone motors. Although the Ultrascale+ MPSoC allows generating PWM signals by means of triple timer counters (TTC), a specialized FPGA module was developed for the sake of simplicity and flexibility.

Efficient implementations of the drivers for the above peripherals, except the DPU, were performed from scratch to offload the CPU as much as possible, as well as minimize execution time variability and the number of memory accesses.

4.2 Inter-domain communication channels

The two domains exchange data by means of two non-blocking communication channels based on shared-memory regions provided by CLARE, where the Linux domain acts as a producer and the critical one as a consumer. The channels are accessed by means of a middleware (available for both Linux and FreeRTOS) that does not require the intervention of the hypervisor at each access and ensures wait-free synchronization in the presence of concurrent accesses. The first channel is used to exchange setpoints for the drone controller, whereas the second one is used to transmit heartbeat packets for health monitoring.

4.3 Linux domain

The Linux domain is responsible for visual tracking and navigation. It includes four tasks, namely Camera, Detector, Tracker, and HB generator. Details on these tasks are reported in Table 2.

Table 2 Linux application-level task set. Priority value ranges between 1 and 99, where higher values correspond to a higher priority

Full size table

The Camera task periodically captures a new frame from the camera and puts it in a queue of frames ready to be processed. The Detector task performs object detection by accelerating the inference of a YOLOv3 (Redmon and Farhadi

In another test, the object detection performance resulting from a 3-core configuration with hypervisor has been compared with the one achievable on the full ZCU-104 without hypervisor, that is, assigning all four cores to Linux. The results are illustrated in Fig. 7, which shows that, by allocating one extra core to Linux, the average frame rate of the object detection task increases from 27.5 FPS to 29.6 FPS.

Note that, in the 4-core configuration, the object detection pipeline uses four parallel threads to match the number of physical cores available on the platform. As expected, this leads to a performance increase, but the observed improvement is not significant, since the processing pipeline is constrained by the acquisition rate of the camera (30 FPS), which limits the benefit of the increased HW and SW parallelism.

Viewed from another angle, the results reported in Fig. 7 show that the two-domain architecture enabled by the hypervisor does not significantly degrade the performance, with respect to a full platform configuration, but certainly provides other relevant advantages in terms of time predictability and security for the critical components of the system.

5.3 Application-level end-to-end delays

This section reports on two experiments aimed at showing the time it takes for the generated data to be propagated from one domain to the other. Since this operation involves a data exchange between two very different operating systems, the communication latency depends on different factors. In particular, from the moment in which the data is written by Linux into the shared memory, three factors come into play: (i) the period of the consumer task running in the FreeRTOS domain, (ii) the time at which this task is scheduled by FreeRTOS, and (iii) the interference experienced by the consumer from the other tasks, which depends on the assigned priorities and the task execution times.

Figure 8 illustrates a possible interleaving of the producer and consumer tasks, where the delay is significant. In the figure, the message is sent by the producer at time \(t_1\), it is delivered to the other domain at time \(t_a\), and finally consumed at time \(t_2\). As it can be seen, the overall end-to-end delay (\(t_2 - t_1\)) is given by the sum of the channel latency (\(L_c\)), the activation interval (\(A_c\)), the interference of the high-priority tasks (\(I_{hp}\)), and the computation time of the consumer task (\(C_c\)). Since the channel latency is always below one microsecond and the execution times of FreeRTOS tasks are in the order of a few hundred microseconds, the major contribution to the end-to-end communication delay is due to the period of the consumer task, which is in the order of milliseconds.

The best-case situation for the end-to-end communication delay is illustrated in Fig. 9, where the consumer task is executed just after the message is delivered to the FreeRTOS domain. In this case, the end-to-end delay is in the order of a few hundred microseconds.

The end-to-end delay measurements performed in this experiment confirm the observations reported above. Figure 10 shows the distribution of the end-to-end delay for the setpoint communication, measured over one hour of continuous execution from the time the setpoint is generated in Linux to the time it is read in FreeRTOS by the Safety module task, which has a period of 10 ms and the lowest priority.

As expected, the maximum observed delay resulted to be 9.5 ms (close to the period of 10 ms assigned to the Safety module task), whereas the minimum observed delay resulted to be 890 \(\mu \)s, due to the sum of its own execution time and the suffered interference from some higher priority task.

Figure 11 shows the distribution of the end-to-end delay from the time the heartbeat is generated in Linux to the time it is received in FreeRTOS.

In this case, the maximum observed delay resulted to be of 3.74 ms (close to the period of 4 ms of the HB checker task), whereas the minimum observed delay resulted to be of about 303 \(\mu \)s, shorter than the other, because the HB checker task has the highest priority and cannot suffer interference from the other tasks.

5.4 Fault reaction time

A final experiment was carried out to measure the latency of the fail-safe procedure triggered by a system fault. For this specific test, the drone is programmed to track a target by controlling only the yaw angle. Hence, Fig. 12 reports the variation of the yaw angle during a tracking operation, when the backup controller is invoked to keep the drone in a safe state after a system fault is injected in Linux.

In this experiment, the system heartbeat validation threshold was set to 3, meaning that the HB checker task can tolerate 3 readings of heartbeat data in the FreeRTOS domain without detecting an update. Note that, since the period of the HB checker is set to 4 ms and the threshold is 3, the expected delay to detect a fault is between 8 and 12 ms. In Fig. 12, the measured delay is represented by the purple arrow between the two vertical red dashed lines, denoting the transient interval between the normal functioning and the fail-safe mode.

In this test, the fault has been injected in Linux after 6.235 s from the beginning of the plot and the backup controller took place at \(t = 6.245\) seconds, that is, after about 10 ms. The orange horizontal line highlights the yaw angle recorded when the fail-safe mode was enabled. In this implementation, the backup controller has been programmed to keep the yaw angle to that reference value.

The plot shows that, when the backup controller was activated, the yaw angle was \(36.3^{\circ }\) and the flight controller acted to keep the yaw angle at this reference value, while simultaneously kee** both pitch and roll angles at \(0^{\circ }\) (hovering). Notice that a little overshoot occurs in most cases, since the yaw PID controller is the one with less authority over the physical system. In fact, yaw actuation is generated by differences in motor torques, while pitch and roll are generated by changing thrusts, so the faster the control (larger gains) the higher the overshoot to be compensated. This can be observed in the figure after the second red line, when the angle continues increasing for a few milliseconds, after which it converges to the reference angle recorded at the fail-safe activation. This behavior is caused by the system dynamics rather than by a delay in the execution of the flight control task.

Note that, without the fail-safe mechanism, the drone would no longer receive a setpoint and therefore would continue to apply the last valid setpoint received, thus kee** rotating around itself. In a more complex use case involving multiple degrees of freedom, this situation would inevitably lead to dangerous consequences.

6 Conclusions

This paper presented a fully functional implementation of a multi-domain architecture for running software components of different criticality and performance requirements on a single heterogeneous platform. The architecture has been implemented on a **linx ZCU104 MPSoc to perform AI-based visual tracking on a drone.

The CLARE-Hypervisor (Accelerat: The CLARE Software Stack) has been used to realize two isolated execution domains: one powered by Linux, responsible for image acquisition, object detection, and AI-based tracking, and one powered by FreeRTOS, responsible for running more critical tasks as the low-level control of the drone.

The results presented in Sect. 5 confirm that the proposed approach is effective to achieve both high timing predictability, needed for guaranteeing a real-time performance, as well as a contained power consumption, essential to save energy in battery operated cyber-physical systems. An example of hypervisor-assisted health monitoring was demonstrated in the context of the studied application to trigger fail-safe routines to bring the drone in a safe state when a fault is detected.

To evaluate the improvement of the proposed solution in more complex scenarios, the ZCU104 board will be replaced with a Kria K26 SOM paired with an ad-hoc carrier board to better fit the SWaP-C requirements of drone systems.

References

Accelerat: The CLARE Software Stack. https://accelerat.eu/clare
Almeida J, Prochazka M (2009) Safe and Secure Partitioning with Pikeos: Towards Integrated Modular Avionics in Space. In: Proceedings of DASIA 2009 Data Systems in Aerospace, p. 27
AMD **linx: DPU - Deep Learning Processing Unit. https://www.xilinx.com/products/intellectual-property/dpu.html
AMD **linx: FINN Framework. https://xilinx.github.io/finn/
AMD **linx: Vitis AI - Adaptable and Real-Time AI Inference Acceleration. https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html
Biondi A, Nesti F, Cicero G, Casini D, Buttazzo G (2020) A Safe, Secure, and Predictable Software Architecture for Deep Learning in Safety-Critical Systems. IEEE Embedded Systems Letters 12(3):78–82
Article Google Scholar
Biondi A, Casini D, Cicero G, Borgioli N, Buttazzo G, Patti G, Leonardi L, Bello LL, Solieri M, Burgio P, Olmedo IS, Ruocco A, Palazzi L, Bertogna M, Cilardo A, Mazzocca N, Mazzeo A (2021) Sphere: a multi-soc architecture for next-generation cyber-physical systems based on heterogeneous platforms. IEEE Access 9:75446–75459
Article Google Scholar
Biondi A, Balsini A, Pagani M, Rossi E, Marinoni M, Buttazzo G (2016) A Framework for Supporting Real-Time Applications on Dynamic Reconfigurable FPGAs. In: Proc. of the IEEE Real-Time Systems Symposium (RTSS 2016), Porto, Portugal
Cavicchioli R, Capodieci N, Bertogna M (2017) Memory Interference Characterization Between CPU Cores and Integrated GPUs in Mixed-Criticality Platforms. In: Proc. of the 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA 2017)
Cordts M, Omran M, Ramos S, Scharwächter T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2015) The Cityscapes Dataset. In: CVPR Workshop on The Future of Datasets in Vision
Craveiro J, Rufino J, Schoofs T, Windsor J (2009) Flexible Operating System Integration in Partitioned Aerospace Systems. In: Actas do INForum - Simposio de Informatica, p 49–60
Crespo A, Balbastre P, Simó J, Coronel J, Pérez DG, Bonnot P (2018) Hypervisor-based multicore feedback control of mixed-criticality systems. IEEE Access 6:50627–50640
Article Google Scholar
Crespo A, Ripoll I, Masmano M, Arberet P, Jean-Jacques M (2009) XtratuM: an Open Source Hypervisor for TSP Embedded Systems in Aerospace. In: Proceedings of DASIA 2009 Data Systems in Aerospace
Crespo A, Ripoll I, Masmano M, Arberet P, Metge JJ (2009) XtratuM: An open source hypervisor for TSP embedded systems in aerospace. In: Data Systems In Aerospace (DASIA), Istanbul, Turkey (May 26-29)
CubePilot: The Cube Autopilot. https://www.cubepilot.com/
Ding C, Wang S, Liu N, Xu K, Wang Y, Liang Y (2019) REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs. ar**v:1909.13396
Fahim F, Hawks B, Herwig C, Hirschauer J, **dariani S, Tran N, Carloni LP, Di Guglielmo G, Harris P, Krupa J, Rankin D, Valentin MB, Hester J, Luo Y, Mamish J, Orgrenci-Memik S, Aarrestad T, Javed H, Loncar V, Pierini M, Pol AA, Summers S, Duarte J, Hauck S, Hsu S-C, Ngadiuba J, Liu M, Hoang D, Kreinar E, Wu Z (2021) hls4ml: an Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices. ar**v. ar**v:2103.05579
Farrukh A, West R (2022) FLYOS: Integrated Modular Avionics for Autonomous Multicopters. In: 28th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2022)
Gaska T, Werner B, Flagg D (2011) Applying virtualization to avionics systems - the integration challenges. IEEE Aerospace and Electronic Systems Magazine 26
Gholami A, Kim S, Zhen D, Yao Z, Mahoney M, Keutzer K (2022) A Survey of Quantization Methods for Efficient Neural Network Inference. Low-Power Computer Vision, p 291–326
Gutiérrez CSV, Juan LUS, Ugarte IZ, Vilches VM (2018) Towards a distributed and real-time framework for robots: Evaluation of ROS 2.0 communications for real-time robotic applications. ar**v:1809.02595
Happe M, Traber A, Keller A (2015) Preemptive hardware multitasking in reconos. In: International Workshop on Applied Reconfigurable Computing
Intel Corporation: Overview of Intel Ready to Fly Drone. https://www.intel.it/content/www/it/it/support/articles/000023272/drones/development-drones.html
Ji Q, Dai C, Hou C, Li X (2021) Real-time embedded object detection and tracking system in zynq soc. EURASIP Journal on Image and Video Processing 21
Klein G, Andronick J, Fernandez M, Kuz I, Murray T, Heiser G (2018) Formally Verified Software in the Real World. Communications of the ACM 61(10):68–77
Article Google Scholar
Leiner B, Schlager M, Obermaisser R, Huber B (2007) A Comparison of Partitioning Operating Systems for Integrated Systems. In: Computer Safety, Reliability, and Security. Springer, Berlin p 342–355
Lelli J, Scordino C, Abeni L, Faggioli D (2016) Deadline scheduling in the Linux kernel. Software: Pract Exper 46(6):821–839
Google Scholar
Liang T, Glossner J, Wang L, Shi S, Zhang X (2021) Pruning and quantization for deep neural network acceleration: a survey. Neurocomputing 461:370–403
Article Google Scholar
Liu M, Niu J, Wang X (2017) An autopilot system based on ros distributed architecture and deep learning. In: 2017 IEEE 15th International Conference on Industrial Informatics (INDIN), pp 1229–1234
LYNX Software Technologies: LynxSecure Embedded Hypervisor and Separation Kernel. http://www.lynx.com/products/hypervisors/
Meier L, Honegger D, Pollefeys M (2015) Px4: A node-based multithreaded open source robotics framework for deeply embedded platforms. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), p 6235–6240
Musliner D, Hendler J, Agrawala A, Durfee E, Strosnider J, Paul CJ (1995) Challenges of real-time ai. Computer 28:58–66
Article Google Scholar
O’Kelly M, Sukhil V, Abbas H, Harkins J, Kao C, Pant YV, Mangharam R, Agarwal D, Behl M, Burgio P, Bertogna M (2019) F1/10: An Open-Source Autonomous Cyber-Physical Platform. ar** mixed-criticality systems in multi-core platforms. J Syst Soft 123:145–159
Qasaimeh M, Denolf K, Lo J, Vissers K, Zambreno J, Jones PH (2019) Comparing energy efficiency of cpu, gpu and fpga implementations for vision kernels. In: 2019 IEEE International Conference on Embedded Software and Systems (ICESS), p 1–8
Radanliev P, De Roure D, Van Kleek M, Santos O, Ani U (2020) Artificial intelligence in cyber physical systems. AI & Sociaty 36:783–796
Article Google Scholar
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. ar**v
Reke M, Peter D, Schulte-Tigges J, Schiffer S, Ferrein A, Walter T, Matheis D (2020) A Self-Driving Car Architecture in ROS2, p 1–6
Rupnow K, Fu W, Compton K (2009) Block, drop or roll(back): alternative preemption methods for rh multi-tasking. In: 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines, p 63–70
Sciangula G, Restuccia F, Biondi A, Buttazzo G (2022) Hardware Acceleration of Deep Neural Networks for Autonomous Driving on FPGA-based SoC. In: the 25th Euromicro Conference on Digital System Design (DSD), Maspalomas, Gran Canaria, Spain
Scordino C, Savino IM, Cuomo L, Miccio L, Tagliavini A, Bertogna M, Solieri M (2020) Real-time virtualization for industrial automation. In: 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), vol. 1, p 353–360
Seng KP, Lee PJ, Ang LM (2021) Embedded intelligence on fpga: survey, applications and challenges. Electronics 10(8):895
Seyoum B, Pagani M, Biondi A, Balleri S, Buttazzo G (2021) Spatio-temporal optimization of deep neural networks for reconfigurable FPGA SoCs. IEEE Transactions on Computers 70(11):1988–2000
Article MathSciNet Google Scholar
SYSGO: PikeOS Hypervisor. http://www.sysgo.com/products/pikeos-rtos-and-virtualization-concept
Vaishnav A, Pham KD, Koch D (2018) A survey on fpga virtualization. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp 131–1317
Wang C, Luo Z (2022) A review of the optimal design of neural networks based on fpga. Appl Sci 12(21):10771
Wojke N, Bewley A, Paulus D (2017) Simple Online and Realtime Tracking with a Deep Association Metric. In 2017 IEEE international conference on image processing (ICIP), 3645–3649 Sep 2017
Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2021) ByteTrack: multi-Object Tracking by Associating Every Detection Box. ar**v
Zhou S, Wang Y, Wen H, He Q, Zou Y (2017) Balanced quantization: an effective and efficient approach to quantized neural networks. Journal of Computer Science and Technology 32

Download references

Acknowledgements

This work has been partially supported by the Italian Ministry of University and Research (MUR) under the SPHERE project funded within the PRIN-2017 framework (Grant No. 93008800505) and the VIRMA project funded by the EU PNRR plan with DM-737/2021.

Funding

Open access funding provided by Scuola Superiore Sant'Anna within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Department of Excellence in Robotics & AI, Scuola Superiore Sant’Anna, Pisa, Italy
Edoardo Cittadini, Mauro Marinoni, Alessandro Biondi, Giorgiomaria Cicero & Giorgio Buttazzo

Authors

Edoardo Cittadini
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Marinoni
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Biondi
View author publications
You can also search for this author in PubMed Google Scholar
Giorgiomaria Cicero
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Buttazzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edoardo Cittadini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cittadini, E., Marinoni, M., Biondi, A. et al. Supporting AI-powered real-time cyber-physical systems on heterogeneous platforms via hypervisor technology. Real-Time Syst 59, 609–635 (2023). https://doi.org/10.1007/s11241-023-09402-4

Download citation

Accepted: 16 June 2023
Published: 17 July 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11241-023-09402-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Supporting AI-powered real-time cyber-physical systems on heterogeneous platforms via hypervisor technology

Abstract