An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring

Elhanashi, Abdussalam; Saponara, Sergio; Dini, Pierpaolo; Zheng, Qinghe; Morita, Daiki; Raytchev, Bisser

doi:10.1007/s11554-023-01353-0

An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring

Research
Open access
Published: 16 August 2023

Volume 20, article number 95, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring

Download PDF

Abdussalam Elhanashi¹,
Sergio Saponara¹,
Pierpaolo Dini¹,
Qinghe Zheng²,
Daiki Morita³ &
…
Bisser Raytchev³

1241 Accesses
Explore all metrics

Abstract

This paper presents a new Edge-AI algorithm for real-time and multi-feature (social distancing, mask detection, and facial temperature) measurement to minimize the spread of COVID-19 among individuals. COVID-19 has extenuated the need for an intelligent surveillance video system that can monitor the status of social distancing, mask detection, and measure the temperature of faces simultaneously using deep learning (DL) models. In this research, we utilized the fusion of three different YOLOv4-tiny object detectors for each task of the integrated system. This DL model is used for object detection and targeted for real-time applications. The proposed models have been trained for different data sets, which include people detection, mask detection, and facial detection for measuring the temperature, and evaluated on these existing data sets. Thermal and visible cameras have been used for the proposed approach. The thermal camera is used for social distancing and facial temperature measurement, while a visible camera is used for mask detection. The proposed method has been executed on NVIDIA platforms to assess algorithmic performance. For evaluation of the trained models, accuracy, recall, and precision have been measured. We obtained promising results for real-time detection for human recognition. Different couples of thermal and visible cameras and different NVIDIA edge platforms have been adopted to explore solutions with different trade-offs between cost and performance. The multi-feature algorithm is designed to monitor the individuals continuously in the targeted environments, thus reducing the impact of COVID-19 spread.

Develo** a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19

Article Open access 22 February 2022

Real-time social distance monitoring and face mask detection based Social-Scaled-YOLOv4, DeepSORT and DSFD&MobileNetv2 for COVID-19

Article 08 September 2023

A Deep Learning Framework for Social Distance Monitoring and Face Mask Detection

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A.
Motivations

The ongoing pandemic of COVID-19 has had a negative impact on the development of society, the economy, and the environment worldwide [1]. COVID-19 has spread widely worldwide, mainly by direct transmission, aerosol, and contact transmission. Direct transmission, when droplets cause infection breathed in through close-range interaction; by Aerosol, when droplets mixed with air form an aerosol that is inhaled [2]; and by Contact if droplets deposited on objects reach the nasal and oral cavities, eyes, or mucous membranes, due to non-sanitized hands. Symptoms of infection recorded are fever, dry cough, general fatigue, nasal congestion, and, more rarely, hypoxia. In the most severe cases, 50% have dyspnea after the first week, which could develop into acute respiratory distress, septic shock, metabolic acidosis, hemorrhage, and coagulation dysfunction. Most patients recover well, but a not-insignificant percentage remain in critical condition or even die. Many countries have taken restrictive measures to limit the spread of infection [3, 4], but with relatively little success. Even now, the key elements to ensure the safety of individuals are technologies that can detect social distance [5,7,8], face masks, and body temperature [9,10,11,12]. To this aim, a promising solution comes from AI-based systems.

This paper proposes integrating an embedded platform of three parallelized models of YOLOv4, a widely used deep-learning detector for object detection. The goal is to increase the degree of detail in detecting the attitudes of individuals that often cause the spread of infection.

B.
State-of-the-art overview

Convolutional neural network models appear to be best suited for applications in image reconstruction and classification [13, 14], object detection [15], and instance segmentation [16]. They are also exploited for their ability to extract features and handle limited or incomplete data sets [17, 18]. YOLO certainly appears to be the most widely used of all the CNN-specific models due to its ability to integrate real-time systems [19]. In this work, three YOLOv4-tiny models have been proposed [21], which is limited to a single feature detection (social distancing only, enhanced with a bird’s eye view for perspective in [21]) and taking input from a thermal camera only.

In contrast, this work refers to real-time multi-feature detection using thermal and visible cameras. Using multiple DL models, the proposed method involves detecting humans and faces with bounding boxes. These detected boxes are then processed to classify whether the individual wears a mask. Meanwhile, the proposed approach is a standalone application to proximate the distance between these individuals and measure their facial temperature. DL is used today in different real-time applications to protect the life of people from damage such as fire disasters [22], health care, and facial feature analysis by processing image or video surveillance systems. Compared with previous work, in addition to changing the application, we have improved aspects related to the computational capabilities of the DL models, as well as enhanced the integration flow on the embedded system to ensure real-time throughput by allowing us to use as many as three different YOLO models parallelized on other cores. Furthermore, several researchers use a combination of RestNet50 [23] and YOLOV3 [24] lightweight neural network architectures with transfer learning techniques. This is to regularize the resource constraints and the accuracy of object detection. In recent years, DL object recognition techniques [25] have been exploited significantly in computer vision tasks and can potentially be more effective than shallow models in solving complex problems. However, DL recognition models emphasize feature and contextual learning [26]. Therefore, object detection architectures [27] are split into two categories, which include two-stage models such as FPN [28], Mask R-CNN [29], and Faster R-CNN [30], and single-stage models such as YOLO [31] YOLOv2 [32], and YOLOv4 [36]. The video camera can be utilized. and the DL algorithm can be used to perform face mask detection and people violating social distancing measurements. Moreover, it performs an effective process for feature extraction from the images. Authors in [37] proposed a framework for performing face mask detection and monitoring social distance to reduce the COVID-19 spread between individuals. They implemented their work on Raspberry PI4, which can perform multiple activities simultaneously. Embedded system-based deep learning algorithms gain increasing attention for different applications of object detection and tracking system [38]. Authors in [39] proposed a system that performs face mask detection, temperature measurement, and measuring social distancing to protect individuals from COVID-19. They presented an integrated approach, which includes Arduino Uno Raspberry Pi-based IoT system. In [40], authors proposed a detection system, non-real-time, for identifying COVID-19 by applying DL models on chest X-ray images.

It proved to be very accurate and hence quite beneficial for radiologists to prompt the detection of COVID-19. Artificial intelligence-enabled technology solutions, such as self-explanatory digital solutions, are needed to deal with the post-pandemic situation in society and industry. It will provide extreme support to minimize the impact of COVID-19 on the counter-economic circumstances [41, 42]. A previous study performed randomized social distancing and mask detection trials, which found that an inexpensive intervention would help interrupt respiratory virus transmission in society [43]. Recent studies have been carried out on handling community gatherings using different methods to minimize the spread of COVID-19 among individuals, such as social distancing and mask usage and temperature measurement, which is also an essential tool to detect symptoms of the virus. These studies utilized different techniques using one or a combination of two methods to prevent the spread of COVID-19. However, these studies hold few limitations from a conceptual framework point of view. The evidence explored literature depicts the need to devise an efficient method to strengthen deep learning technology to respond effectively to the outbreak. In this paper, we propose an integrated approach that incorporates all three technologies (mask detection, social distancing, and temperature measurement) that can provide numerous advantages in controlling the spread of infectious diseases. It can help identify individuals who may be infected but are asymptomatic and provide real-time data on compliance with public health guidelines. Furthermore, an integrated approach can help to overcome the limitations of using each technology individually. Table 1 shows a summary of existing studies.

Table 1 Comparison review of different studies, using different techniques (one method solely or a combination of two) to prevent COVID-19 spread

Full size table

C.
Contributions

Our goal in this research is to enrich COVID-19 prevention system and examine the integrated algorithm to the other methodologies from the state-of-the-art. Therefore, an AI-enabled technology will enhance the overall situation by minimizing the lockdown phases, where systems such as surveillance, detection, and monitoring will be implemented by utilizing DL models and IoT-embedded devices as the required core solution to the ongoing pandemic. The contributions of this work are summarized as the followings:

This integrated approach can help prevent the spread of COVID-19 by monitoring social distancing, face mask detection, and facial temperature measurement by employing fusion of three different YOLOv4-tiny object detectors, to simultaneously monitor and detect these features in real-time.
The proposed YOLOv4 tiny can perform object detection and tracking much faster than the other state-of-the-art deep learning models. Despite its smaller size, YOLOv4 tiny can still achieve high accuracy in detecting objects for real-time applications.
Executing the proposed models on NVIDIA boards (Jetson nano and Xavier AGX) showcases its potential scalability and efficiency, paving the way for real-world applications in various scenarios with different trade-offs between cost and performance.
A single thermal camera has developed thermal screening systems to measure facial temperature for more than one person at once, while this camera continues to monitor social distancing between pedestrians.

The aim of YOLOv4-tiny in this research is to detect the objects in video frames. Given an input frame, the model processes it through its convolutional neural network to generate bounding box predictions and associated class probabilities. Specifically, we integrated three different YOLOv4-tiny object detectors into the system, each serving a specific task: social distancing monitoring, mask detection, and facial temperature measurement. YOLOv4-tiny is a deep learning model known for its efficiency and suitability for real-time applications, making it a suitable choice for this edge-AI algorithm. The proposed models were trained on different data sets for people detection, mask detection, and facial temperature measurement. These data sets contain a diverse range of samples to ensure robustness and accuracy in different scenarios.

The rest of the paper is organized as follows: Section 2 presents the proposed methodology; Section 3 presents the obtained results and the discussion; Section 4 describes real-time implementation on edge NVIDIA platforms. Finally, conclusions are drawn in Sect. 5.

2 Proposed algorithm design methodology

In this work, we implemented the proposed method for multiple tasks, including monitoring social distancing and facial temperature measurement, using face mask detection algorithms. This approach provides an automated surveillance system, which uses video cameras to warn authorities and help them ensure the individuals comply with social distancing regulations, measuring their face temperature, and face mask detection norms to reduce virus spread. Three models of YOLOv4-tiny are utilized for the tasks described above. The proposed approach started with collecting the data sets for 3 tasks. Then, we trained and tested the YOLOv4-tiny models to evaluate their performance and robustness. The final prototype approach executed on the embedded system (Jetson Nano or Xavier AGX) is connected to the monitoring system to be executed as a standalone application in these devices. We used a visible video camera for face mask detection and a thermal camera for social distance classification and measuring facial temperature. The visible and thermal cameras are operated simultaneously, installed, and executed on NVIDIA devices. Figure 1 shows the integrated approach for face mask detection, social measuring, and facial temperature video measurement.

A.
Face mask detection

The images of face masks have been used from various sources on the internet. We selected various people of different ages in indoor and outdoor public places. 900 images have been used for this experiment. The selected images include single faces and crowded groups of individuals that appeared from different angles in these images. We have selected different types of masks with different colors, see Fig. 2. A data annotation tool has been used to label the targeted faces on the images. There are various data annotations, such as image and video annotations, key-point annotations, and Polygonal segmentation annotations. In addition, LabelImg was utilized to label object bounding boxes on the images. This tool allows saving annotations in different formats. The YOLOv4-tiny model has been designed and trained for face mask detection. Figure 3 shows the workflow for designing and training YOLOv4-tiny for face mask detection. The proposed approach aims to build a custom real-time model for face mask detection.

B.
Social distancing

In this research, YOLOv4-tiny model is used for human detection. 2000 thermal images have been collected from various sources. This data set consists of thermal images of people, which were acquired from different realistic indoor and outdoor environments. These thermal images contain natural scenes of human activity recognition, including walking, talking, standing, and sitting. A custom annotation tool has been utilized to label persons with bounding boxes. We used the Euclidean formula to compute the distance and the centroid information for the detected bounding boxes. In this work, the Euclidean measurement distance is determined as 6 feet. We have assigned two different thresholds for violation rules as dangerous and warn for the detected persons. We assigned the first threshold as warn, determined with yellow color, and the second threshold as dangerous, determined with red. If the distance between the detected people is less than or equal to 5 feet, the color of the bounding box is set to red. The bounding box color changes to yellow when the space between the detected bounding boxes is less than or equal to 6 feet and more than 5 feet. When the distance between the detected persons is more than 6 feet, the bounding box color is set to green, meaning social distancing is maintained safely.

The proposed approach has been implemented with Bird’s eye view to eliminate the perspective view from the video camera. The top-down view helps our idea to improve the scalability of a social distancing estimation system. The video camera does not have to be set up in a specific way. Neither the camera's height nor the inclination angle needs to be determined. Instead, it needs to click four dots on the captured video images that will be the plane's corner points, transforming the targeted classes into a top-down view. These points must create a rectangle with at least 2 two opposite sides parallel. If this system is turned into a product, it can be adopted effectively.

C.
Facial temperature measurement

Facial images have been utilized from work [50], see Fig. 4. Most facial thermal data sets were collected from indoor and outdoor environments. These images were acquired from different scenes, including people in different body positions and facial expressions from a thermal video camera. 9.982 images have been utilized for this work. The thermal images have been inverted to get the negative images. Gamma correction has been applied to these negative images to improve their visibility. This enhanced the brightness of the features from the captured facials. The proposed system calculates the average temperature of individuals’ faces based on pixel interpolation from a given image frame. The process determines the average temperature for each person ‘face within the frame. Initially, the code loops through each person's faces bounding box in the frame and extracts the region of interest (ROI) corresponding to that person's faces. The process begins with the function get_person_temperature, which takes a list of bounding boxes (boxs) and an input image frame (frame). It proceeds to iterate over each bounding box in the list and extracts the region of interest (ROI) from the input image, assuming that the ROI contains the person's face. Python and appropriate libraries (e.g., OpenCV or PyTorch) are utilized to read the image and extract raw pixel values. By analyzing the pixels in the ROI, the code calculates the average temperature value. This temperature value is then mapped to a temperature range 36–38 °C using a custom map_function, allowing for better representation and visualization. The map_function is instrumental in this process as it transforms the calculated average temperature value from its original range. Finally the obtained raw pixel values are converted into integers.

D.
Model building and training

YOLOv4-tiny structure is a deep convolutional neural network designed for object detection and recognition. It is a smaller and faster version of the original YOLOv4 model but still maintains high accuracy and precision in detecting objects in images and videos. The lightweight nature of YOLOv4-tiny also makes it suitable for use in mobile and embedded devices, which are becoming increasingly popular for real-time applications. With the rise of the Internet of Things (IoT), there is a growing need for low-power, low-cost devices that can perform real-time object detection. YOLOv4-tiny is well-suited for this task, as it can run on devices with limited processing power and memory. In YOLOv4-tiny, the classification model is typically based on the CSPDarknet53 architecture, which is a custom deep neural network architecture specifically designed for the YOLO models. CSPDarknet53 is a convolutional neural network and backbone for object detection that uses DarkNet-53. It employs a CSPNet strategy to partition the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network; Fig. 5 shows the structure of YOLOv4-tiny model. The convolutional neural network layers have been compressed to 29 layers to achieve fast detection. As a result, YOLOv4-tiny reached up to 371 fps, which could meet the requirement of real-time applications. YOLOv4-tiny model utilizes the CSPDarknet53-tiny network as a backbone, substituting the CSPDarknet53 network used in YOLOv4 architecture. The CSPDarknet53-tiny network is the CSP-Block architecture in the cross-stage model. It substituted the Res-Block architecture within the residual network. The feature map is divided by CSP-Block architecture into two segments. This creates a gradient, which could generate two separate paths for the network. CSP-Block architecture has the capability to enhance the learning of CNN in contrast to the Res-Block architecture. However, the accuracy of the model is improved by the increased computation. It eliminates the computational bottlenecks with higher computational overhead in the CSP-Block architecture to minimize the computational cost. Furthermore, it enhances the performance of the YOLOv4-tiny model with constant by reducing the computation. To improve the computation process, the Leaky-ReLU function is used as an activation function in YOLOv4-tiny model instead of mixed activation function used in YOLOv4 architecture, see Eq [1]. The Leaky-ReLU function is

$${y}_{i}=\left\{\begin{array}{c}\frac{{x}_{i}}{{a}_{i}} if {x}_{i}<0\\ \\ {x}_{i} if {x}_{i}\ge 0,\end{array}\right.$$

(1)

where ${a}_{i}\in (1,+\infty )$ is a constant value.

3 Experiment results and discussion

A. Evaluation matrices

In this research, we used the following performance confusion metrics criteria [51, 52] to evaluate the proposed framework: accuracy, recall, and precision see Eq. [2], where $TP$ True Positive, $TN$ True Negative, $FP False Positive,$ $FN$ $False Negative$ were calculated from confusion matrix criteria. Accuracy can be defined as the number of all correct predictions divided by the total number of the data set. Precision is the percentage of correct positive predictions. It indicates how many selected predictive values are relevant. Finally, recall is the ability of the model to find all the appropriate cases within the given data set:

$$\begin{aligned} {\text{Accuracy }} & = \frac{{\left( {{\text{TP}} + {\text{TN}}} \right)}}{{\left( {TN + {\text{FN}}} \right) + \left( {{\text{FP}} + {\text{TP}}} \right)}} \\ {\text{Precision }} & = \frac{{{\text{TP}}}}{{\left( {{\text{FP}} + {\text{TP}}} \right)}} \\ {\text{Recall }} & = \frac{{{\text{TP}}}}{{\left( {{\text{FN}} + {\text{TP}}} \right)}}. \\ \end{aligned}$$

(2)

B. Results of the proposed system

The full description of the experiment results performed in this study is included in this subsection. The proposed system operates automated for social distancing, face mask detection, and facial temperature measurement. The simulation has been performed on the testing data set for the three tasks. The images have been acquired from different realistic situations, including indoor/outdoor environments. In addition, we designed other DL models, which include YOLO, YOLOv2, YOLOv3-tiny, and Faster R-CNN. This is to assess the proposed YOLOv4-tiny performance using the same training/testing data sets with these object detection architectures. According to the results from the experiments, see Fig. 7, the performance of YOLOv4-tiny overcomes the other DL models for the three tasks (person detection, mask detection, and facial detection for temperature measurement). The first YOLOv4-tiny model for person detection has been assessed on thermal videos, which showed promising results among the social distancing classification algorithm.

The key challenge for social distancing is the accuracy of measuring the actual distance between the detected individuals in the thermal videos. Top-down view approach has improved the perspective view and has been used to process the video images from a 2-D view to a Bird’s eye view. As a result, the centroids of the detected bounding box are transformed from the input image onto a top-down view, and then the social distance classification is performed. In addition, the threshold of violation for social distance is highlighted, which can also be correlated with the assigned bounding box colors among the individuals. Simultaneously, the second YOLOv4-tiny is executed to perform facial detection to measure individual temperature. The average acquired pixels have been mapped from the enclosed bounding boxes on the faces, which are assigned with blue color, and then converted into numbers, see Fig. 8.

We examined the third YOLOv4-tiny model to detect if people wear respirator face masks or not. A green color indicates those people wearing face masks, while red is used for those not wearing face masks. In addition, on the top of each detected bounding box, two labels are assigned (Mask or No Mask), see Fig. 9 (false negatives and positives were noted from this experiment). However, the proposed model achieved promising results in detecting real-time interactions among individuals. The proposed work for social distancing achieved better results in comparison with the method [8], which utilized two data sets of thermal images. It used a customized YOLOv2 lightweight architecture for object detection. YOLOv4-tiny represents a significant improvement over YOLOv2 in various aspects. It boasts a more powerful backbone network, utilizing CSPDarknet53, leading to enhanced feature extraction and better object detection performance. The proposed techniques are compared to the other methodologies for measuring social distance and face mask detection to assess performance based on accuracy [53,54,55,56,57,58]. These methods utilized different data sets for social distancing and mask detection in comparison with this work. The proposed approach achieved an accuracy of 96.2% for social distance measurement and 95.1% for face mask detection, and 96% for facial temperature model. Furthermore, YOLOv4-tiny utilizes anchor boxes to detect objects in different scales and aspect ratios. This architecture enables faster and more accurate object detection than MobileNet single shot detector (SSD), which has been utilized in the method [54]. In addition, robustness to occlusion and small objects: YOLOv4-tiny is more robust and can detect small objects better CV and IoT algorithm, which was utilized in the method [53]. This is because YOLOv4-tiny uses a better feature extractor that can capture more detailed features of objects from the images. Nagrath et al. [56] utilize MobileNetv2 for facemask detection. Its convolutional neural network architecture has gained popularity due to its lightweight and efficient design, making it a suitable choice for mobile and embedded devices. However, despite its advantages, there are still some drawbacks and limitations of the MobileNetV2 architecture: the lack of residual connections, which are present in other deep learning models, such as ResNet in YOLOv4-tiny. These connections allow information to flow directly from one layer to another, facilitating the training of deeper networks. Without these connections, the model may suffer from the vanishing gradient problem, making it difficult to train the model. Tables 4 and 5 show the model accuracy compared to the other social distancing and mask detection methods. YOLOv4-tiny made it possible to detect COVID-19 pandemic in terms of respecting social distancing, face mask detection, and measuring the facial temperatures among individuals.

Table 4 YOLOv4-tiny vs. other methods for social distancing

Full size table

Table 5 YOLOv4-tiny vs. other face masks/no mask detection methods

Full size table

4 Real-time edge implementation

The final designed models have been executed in real-time on resource-constrained Edge NVIDIA platforms. We utilized Jetson Xavier and Jetson nano to execute the proposed architectures. Table 6 presents a comparison of the proposed NVIDIA platforms, Jetson Nano, and Jetson Xavier. The Jetson Nano features a 128-core Maxwell GPU, a Quad-core ARM A57 CPU, and delivers 472 GFLOPs of AI performance. It is equipped with 4 GB of 64-bit LPDDR4 RAM and a MicroSD card slot for storage, offering a maximum resolution of 4 K @ 30 fps. Supported AI frameworks include TensorFlow, PyTorch, and Caffe. In contrast, the Jetson Xavier boasts a 512-core GPU, an 8-core ARMv8.2 CPU, providing 30 TOPs of AI performance. It comes with 16 GB of 256-bit LPDDR4x RAM and 16 GB eMMC flash storage, supporting 2 × 4 K @ 30 fps resolution. In addition, it supports various AI frameworks, such as TensorFlow, PyTorch, Caffe, cuDNN, CUDA, among others. However, the Jetson Xavier consumes more power, ranging from 10 to 30 W, while the Jetson Nano's power consumption lies between 5 and 10W. This research activity integrated the face mask detection approach with social distancing and measuring the face temperature of the individuals. This approach examines multiple DL model execution on a single NVIDIA board. Different cameras were utilized in this work, including Raspberry Pi model 2.1, See2CAM camera as a visible camera for face mask detection, and lepton 3.5, FLIR BOSON cameras for social distancing and measuring the facial temperature for the individuals. Lepton and Raspberry cameras have been connected with Jetson-nano. Boson and See3CAM cameras have been connected with Jetson Xavier AGX. Thermal cameras are radiometric measurements that can extract every pixel in the image. Therefore, the color map in the image has been converted to an array and integral of temperature values, which can be read.

Table 6 Specification for the proposed NVIDIA platforms (Jetson nano & Jetson Xavier)

Full size table

in numbers. Thanks to OpenCV and its supported libraries. We adjusted the frame height and width sizes for each camera output to 416 × 416. The proposed integrated approach has been executed on both Edge NVIDIA platforms. Based on the experiment results, the two cameras simultaneously produced mask face detection, facial temperature, and social distancing classification on the centralized monitoring system, see Fig. 10.

We recorded the real-time detection and power consumption on both Edges NVIDIA platforms to assess the proposed techniques’ performance, which includes social distancing (SD), Mask detection (MD), and facial temperature measurement (FTM) with different algorithm running scenarios, see Tables 7 and 8. It has been observed that when three models run together, real-time detection performance decreases due to increased computation cost. Furthermore, for the variation of temperature, it is observed that the temperature of Jetson nano is higher than the temperature of Jetson Xavier when the proposed approach is running simultaneously for the three tasks, which leads to generating an alarm of over temperature and degrades the performance of Jetson nano, see Fig. 11, b. This temperature difference attributed to the increased workload on Jetson nano as it struggles to handle the simultaneous execution of the three tasks, leading to elevated heat generation and potentially impacting overall performance.

Table 7 Real-time detection for proposed method on Jetson Nano

Full size table

Table 8 Real-time detection for proposed method on Jetson Xavier AGX

Full size table

This research compares the proposed approach to other methodologies, including pre-trained neural network models. The advantage of the integrated techniques is its small disk storage size (22.9 MB) for the YOLOv4-tiny of social distancing task, (22.8 MB) for facial temperature YOLOv4-tiny model, and (23 MB) for mask detection YOLOv4-tiny, which these architectures have few learnable parameters. This makes them executable for low-cost IoT devices. On the other hand, other methodologies utilize pre-trained CNN layers that require large storage sizes to disk, such as the Resnet50 model [42]. In addition, the performance of these pre-trained models is very low for real-time applications on low-cost embedded, which impairs the performance of the targeted deep learning models from videos and images. The proposed DL for the three tasks algorithms utilizes lightweight and efficient deep learning models. These models are specifically designed to run on resource-constrained devices. Techniques such as model quantization, pruning, and knowledge distillation are applied to reduce the model's size and computational complexity while preserving its accuracy to the extent possible. NVIDIA devices (Jetson nano & Jetson Xavier) integrate specialized hardware accelerators such GPUs (Graphics Processing Units). These accelerators are optimized for performing matrix operations and other machine learning tasks, significantly speeding up the computations required for AI processing. To process multiple features simultaneously, NVIDIA devices leverage parallel computing techniques. They split the workload across multiple cores or threads available on the device's processor, allowing the algorithm to handle multiple inputs and outputs concurrently.

5 Conclusion

This research presented social distancing, mask detection algorithm, and facial temperature as an integrated approach executed in real-time on a single NVIDIA board. This assesses the robustness of low-cost embedded systems to run multiple deep-learning models simultaneously. The proposed vision-based system can be utilized in any indoor/environment, such as public areas, train stations, streets, shop** centers, and smart cities, where the performance is suitable to fulfill the purpose. The proposed work ensures safe conditions between the individuals. In addition, the developed deep learning models were validated through multiple experiments and achieved promising results. Jetsons are low power consumption relative to computing power. We performed different experiments on Jetson nano & Jetson Xavier AGX with different algorithms scenarios. The highest real-time performance was obtained on Jetson Xavier AGX, which achieved 18 fps from the thermal camera and 62 fps from the visible camera when the three YOLOv4-tiny, based models executed at the same time. It is noted in this research that the claims of improved real-time detection in Jetson Xavier AGX lead to increased power demands compared to the performance in Jetson nano. This is due to continuously consuming a large amount of energy for the GPU architectures in Jetson Xavier AGX. Further to our exploration, recently released YOLOv7 will be considered for the integrated approach.

References

Team, T.V., D.J.: Coronavirus: a visual guide to the outbreak. 6 Mar. 2020, https://www.bbc.co.uk/news/world-51235105. Accessed 07 Nov 2022
Nalbandian, A., Sehgal, K., Gupta, A., et al.: Post-acute COVID-19 syndrome. Nat. Med. 27, 601–615 (2021)
Article Google Scholar
Soba, D., et al.: Traffic restrictions during COVID-19 lockdown improve air quality and reduce metal biodeposition in tree leaves. Urban For. Urban Green. 70, 127542 (2022)
Article Google Scholar
Hsiang, S., Allen, D., Annan-Phan, S., et al.: The effect of large-scale anti-contagion policies on the COVID-19 pandemic. Nature 584, 262–267 (2020)
Article Google Scholar
Goniewicz, K., Khorram-Manesh, A.: maintaining social distancing during the COVID-19 outbreak. Soc. Sci. 10, 14 (2021). https://doi.org/10.3390/socsci10010014
Article Google Scholar
Mahmoudi, J., **ong, C.: How social distancing, mobility, and preventive policies affect COVID-19 outcomes: big data-driven evidence from the District of Columbia-Maryland-Virginia (DMV) megaregion. PLoS ONE 17(2), e0263820 (2022)
Article Google Scholar
Somaldo, P., Ferdiansyah, F.A., Jati, G., Jatmiko, W.: Develo** smart COVID-19 social distancing surveillance drone using YOLO implemented in robot operating system simulation environment. In: 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), Kuching, Malaysia, 2020, pp. 1–6, https://doi.org/10.1109/R10-HTC49770.2020.9357040
Saponara, S., Elhanashi, A., Gagliardi, A.: Implementing a real-time, AI-based, people detection and social distancing measuring system for Covid-19. J. Real-Time Image Proc. 18, 1937–1947 (2021). https://doi.org/10.1007/s11554-021-01070-6
Article Google Scholar
Zhang, L., Zhu, Y., Jiang, M., Wu, Y., Deng, K., Ni, Q.: Body temperature monitoring for regular COVID-19 prevention based on human daily activity recognition. Sensors (Basel). 21(22), 7540 (2021). https://doi.org/10.3390/s21227540
Article Google Scholar
Safiabadi Tali, S.H., LeBlanc, J.J., Sadiq, Z., Oyewunmi, O.D., Camargo, C., Nikpour, B., Armanfard, N., Sagan, S.M., Jahanshahi-Anbuhi, S.: Tools and techniques for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)/COVID-19 detection. Clin. Microbiol. Rev. 34(3), e00228-e320 (2021). https://doi.org/10.1128/CMR.00228-20
Article Google Scholar
Dell’Isola, G.B., Cosentini, E., Canale, L., Ficco, G., Dell’Isola, M.: Noncontact body temperature measurement: uncertainty evaluation and screening decision rule to prevent the spread of COVID-19. Sensors 21, 346 (2021)
Article Google Scholar
Zhou, Z., et al.: Temperature dependence of the SARS-CoV-2 affinity to human ACE2 determines COVID-19 progression and clinical outcome. Comput. Struct. Biotechnol. J. 19, 161–167 (2021)
Article Google Scholar
Saponara, S., Elhanashi, A., Zheng, Q.: Recreating fingerprint images by convolutional neural network autoencoder architecture. IEEE Access 9, 147888–147899 (2021)
Article Google Scholar
Zheng, Q., et al.: Improvement of generalization ability of deep CNN via implicit regularization in a two-stage training process. IEEE Access 6, 15844–15869 (2018)
Article Google Scholar
Tang, C., Feng, Y., Yang, X., Zheng, C., Zhou, Y.: The object detection based on deep learning. In: 2017 4th International Conference on Information Science and Control Engineering (ICISCE), 2017, pp. 723–728, https://doi.org/10.1109/ICISCE.2017.156.
Il Lee, S., Kim, H.: Instant and accurate instance segmentation equipped with path aggregation and attention gate. In: 2020 International SoC Design Conference (ISOCC), 2020, pp. 320–321
Zheng, Q., Zhao, P., Li, Y., et al.: Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput. Appl. 33, 7723–7745 (2021). https://doi.org/10.1007/s00521-020-05514-1
Article Google Scholar
Zheng, Q., Zhao, P., Wang, H., Elhanashi, A., Saponara, S.: Fine-grained modulation classification using multi-scale radio transformer with dual-channel representation. IEEE Commun. Lett. 26(6), 1298–1302 (2022). https://doi.org/10.1109/LCOMM.2022.3145647
Article Google Scholar
Saponara, S., Elhanashi, A.: Impact of Image Resizing on Deep Learning Detectors for Training Time and Model Performance.” In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment, and Society. ApplePies 2021. Lecture Notes in Electrical Engineering, vol 866. Springer, Cham. (2022) https://doi.org/10.1007/978-3-030-95498-7_2.
Jiang, Z., Zhao, L., Li, S., Jia, Y.: Real-time object detection method based on improved YOLOv4-tiny,” ar** a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19. J Real-Time Image Proc 19, 551–563 (2022). https://doi.org/10.1007/s11554-022-01203-5
Article Google Scholar
Saponara, S., Elhanashi, A., Gagliardi, A.: Exploiting R-CNN for video smoke/fire sensing in antifire surveillance indoor and outdoor systems for smart cities. In: 2020 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 392–397.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: RestNet50: Inverted residues and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
Won, J.-H., Lee, D.-H., Lee, K.-M., Lin, C.-H.: An improved YOLOv3-based neural network for de-identification technology. In: 2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), JeJu, Korea (South), 2019, pp. 1–2, doi: https://doi.org/10.1109/ITC-CSCC.2019.8793382
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Type pyramid networks for object detection. In: IEEE Conference Proceedings on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125
Hohman, F., et al.: Visual analytics in deep learning: an interrogative survey for the next frontiers. IEEE Trans Vis Comput Graph 25(8), 2674–2693 (2018)
Article Google Scholar
Singh, G., Tiwari, S., Singh, J.: Real time object detection using neural networks: a comprehensive survey. In: 2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 2023, pp 1281–1286, https://doi.org/10.1109/ICAIS56108.2023.10073826
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944, https://doi.org/10.1109/CVPR.2017.106.
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988, https://doi.org/10.1109/ICCV.2017.322
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Redmon, J.: You only look once: Unified, real-time object detection. In: IEEE CVPR, pp. 779–788. 2016
JRedmon, J. et al.: YOLO9000: better, faster, stronger. In: IEEE CVPR 2017
Bochkovskiy, A., Wang, C., Liao, H.: YOLOv4: optimal speed and accuracy of object detection. Comput Sci (2020). ar**v:2004.10934
Viola, P., Jones, M.: Fast object detection using an enhanced cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, pp. I-I.
Ottakath, N., et al.: ViDMASK dataset for face mask detection with social distance measurement. Displays 73, 102235 (2022)
Article Google Scholar
Farman, H., Khan, T., Khan, Z., Habib, S., Islam, M., Ammar, A.: Real-time face mask detection to ensure COVID-19 precautionary measures in the develo** countries. Appl. Sci. 12, 19 (2022)
Javed, I., Butt, M.A., Khalid, S., et al.: Face mask detection and social distance monitoring system for COVID-19 pandemic. Multimed. Tools Appl. 82, 14135–14152 (2023). https://doi.org/10.1007/s11042-022-13913-w
Article Google Scholar
Zhao, M., Jha, A., Liu, Q., Millis, B.A., Mahadevan-Jansen, A., Lu, L., Landman, B.A., Tyska, M.J., Huo, Y.: Faster mean-shift: GPU-accelerated clustering for cosine embedding-based cell segmentation and tracking. Med. Image Anal. 71, 102048 (2021). (ISSN)
Article Google Scholar
Qin, B., Li, D.: Identifying facemask-wearing condition using image super-resolution with classification network to prevent COVID-19. Sensors 20(18), 5236 (2020)
Article Google Scholar
Elhanashi, A., Lowe, D., Saponara, S., Moshfeghi, Y. : Deep learning techniques to identify and classify COVID-19 abnormalities on chest x-ray images. In: Proc. SPIE 12102, Real-Time Image Processing and Deep Learning 2022
Greenhalgh, T., Schmid, M.B., Czypionka, T., Bassler, D., Gruer, L.: Face masks for the public during the COVID-19 crisis. BMJ 369, m1435 (2020). https://doi.org/10.1136/bmj.m1435
Article Google Scholar
Salagrama S., Kumar H.H., Nikitha, R., Prasanna, G., Sharma, K., Awasthi, S.: Real time social distance detection using Deep Learning. In: 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 2022, pp. 541–544, https://doi.org/10.1109/CISES54857.2022.9844327.
Vibhuti, **dal, N., Singh, H., et al.: Face mask detection in COVID-19: a strategic review. Multimed. Tools Appl. 81, 40013–40042 (2022). https://doi.org/10.1007/s11042-022-12999-6
Article Google Scholar
Wu, Y., Zhang, Q., Li, L., Li, M., Zuo, Y.: Control and prevention of the COVID-19 epidemic in China: a qualitative community case study. Risk Manag. Health Policy. 9(14), 4907–4922 (2021). https://doi.org/10.2147/RMHP.S336039. (PMID:34916861;PMCID:PMC8668872)
Article Google Scholar
Zhao, Q., Wang, Y., Yang, M., et al.: Evaluating the effectiveness of measures to control the novel coronavirus disease 2019 in Jilin Province, China. BMC Infect. Dis. 21, 245 (2021). https://doi.org/10.1186/s12879-021-05936-9
Article Google Scholar
Dzien, C., Halder, W., Winner, H., et al.: Covid-19 screening: are forehead temperature measurements during cold outdoor temperatures helpful? Wien Klin Wochenschr 133, 331–335 (2021). https://doi.org/10.1007/s00508-020-01754-2
Article Google Scholar
Prasad, J., Jain, A., Velho, D., Sendhil Kumar, K.S.: COVID vision: an integrated face mask detector and social distancing tracker. Int. J. Cognit. Comput. Eng. 3, 106–113 (2022). (ISSN 2666-3074)
Article Google Scholar
Varshini, B., Yogesh, H.R., Pasha, S., Suhail, M., Madhumitha, V., Sasi, A.: IoT-enabled smart doors for monitoring body temperature and face mask detection. Glob. Trans. Proc. (2021). https://doi.org/10.1016/j.gltp.2021.08.071
Article Google Scholar
Lippi, G., Nocini, R., Mattiuzzi, C., Henry, B.M.: Is body temperature mass screening a reliable and safe option for preventing COVID-19 spread? Diagnosis (Berl). 9(2), 195–198 (2021). https://doi.org/10.1515/dx-2021-0091. (PMID: 34472762)
Article Google Scholar
Kuzdeuov, A., Aubakirova, D., Koishigarina, D., Varol, H.A.: TFW: annotated thermal faces in the wild dataset. IEEE Trans. Inf. Forensics Secur. 17, 1–11 (2022)
Dini, P., Saponara, S.: Analysis, design, and comparison of machine-learning techniques for networking intrusion detection. Designs 5(1), 9 (2021)
Article Google Scholar
Dini, P., et al.: Design and testing novel one-class classifier based on polynomial interpolation with application to networking security. IEEE Access 10, 67910–67924 (2022)
Article Google Scholar
Giuliano, R., Innocenti, E., Mazzenga, F., Vegni, A.M., Vizzarri, A.: IMPERSONAL: an IoT-Aided computer vision framework for social distancing for health safety. IEEE Internet of Things J. 9(10), 7261–7272 (2022). https://doi.org/10.1109/JIOT.2021.3097590
Article Google Scholar
Ahamad, A.H., Zaini, N., Latip, M.F.A.: Person detection for social distancing and safety violation alert based on segmented ROI. In: 2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), Penang, Malaysia, 2020, pp. 113–118, doi: https://doi.org/10.1109/ICCSCE50387.2020.9204934
Gopal, B., Ganesan, A.: Real time deep learning framework to monitor social distancing using improved single shot detector based on overhead position. Earth Sci. Inform. 15, 585–602 (2022). https://doi.org/10.1007/s12145-021-00758-4
Article Google Scholar
Nagrath, P., et al.: SSDMNV2: A real-time DNNbased face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 66, 102692 (2021). https://doi.org/10.1016/j.scs.2020.102692
Article Google Scholar
Teboulbi, S., Messaoud, S., Hajjaji, M.A., Mtibaa, A.: Real-time implementation of AI-based face mask detection and social distancing measuring system for COVID-19 prevention. Sci. Program. 2022, 8340779 (2022)
Google Scholar
Chen, Q., Sang, L.: Face-mask recognition for fraud prevention using Gaussian mixture model. J. Vis. Commun. Image Represent. 55, 795–801 (2018)
Article Google Scholar
Li, Y., et al.: Crop** and attention based approach for masked face recognition. Appl. Intell. 51, 3012–3025 (2021)
Article Google Scholar

Download references

Acknowledgements

We thank the Re-Start Toscana COVID-19 project and the Testarossa EuroHPC project for their support.

Funding

Open access funding provided by Università di Pisa within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Ingegneria Informazione, University of Pisa, Pisa, Italy
Abdussalam Elhanashi, Sergio Saponara & Pierpaolo Dini
School of Intelligent Engineering, Shandong Management University, **an, 250357, Shandong, China
Qinghe Zheng
Department of Information Engineering, Hiroshima University, Hiroshima, Japan
Daiki Morita & Bisser Raytchev

Authors

Abdussalam Elhanashi
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Saponara
View author publications
You can also search for this author in PubMed Google Scholar
Pierpaolo Dini
View author publications
You can also search for this author in PubMed Google Scholar
Qinghe Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Daiki Morita
View author publications
You can also search for this author in PubMed Google Scholar
Bisser Raytchev
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AE carried out the experiments, and wrote the main manuscript text with support from PD BR & DM contributed to the final version of the manuscript. SS supervised the project.

Corresponding author

Correspondence to Abdussalam Elhanashi.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Elhanashi, A., Saponara, S., Dini, P. et al. An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring. J Real-Time Image Proc 20, 95 (2023). https://doi.org/10.1007/s11554-023-01353-0

Download citation

Received: 09 March 2023
Accepted: 01 August 2023
Published: 16 August 2023
DOI: https://doi.org/10.1007/s11554-023-01353-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring

Abstract

Similar content being viewed by others

Develo** a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19

Real-time social distance monitoring and face mask detection based Social-Scaled-YOLOv4, DeepSORT and DSFD&MobileNetv2 for COVID-19

A Deep Learning Framework for Social Distance Monitoring and Face Mask Detection

1 Introduction

2 Proposed algorithm design methodology

3 Experiment results and discussion

4 Real-time edge implementation

5 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring

Abstract

Similar content being viewed by others

Develo** a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19

Real-time social distance monitoring and face mask detection based Social-Scaled-YOLOv4, DeepSORT and DSFD&MobileNetv2 for COVID-19

A Deep Learning Framework for Social Distance Monitoring and Face Mask Detection

1 Introduction

2 Proposed algorithm design methodology

3 Experiment results and discussion

4 Real-time edge implementation

5 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation