Abstract
With the high rate of accidents and crimes around the world, the importance of video surveillance is growing every day and intelligent surveillance systems are being developed to perform surveillance tasks automatically. Detecting human beings accurately in a visual surveillance system is crucial for diverse application areas. The first step in the detection process is to detect moving objects. Then, the moving object could be classified either in the human class or in the non-human class. Human classification is an important process to build effective surveillance system. In this article, an efficient human detection algorithm is proposed by processing the regions of interest (ROI) based on a foreground estimation. In our proposal, we used MobileNetV2 deep convolution neural network, designed to be used in embedded devices, with transfer learning approach to build fine-tuned model for an efficient classification of ROI into human or not human. We train the fine-tuned model on INRIA person dataset using three scenarios. The resulting models were extensively evaluated on INRIA test dataset benchmark and they achieved an F-Score value of 98.35%, 98.72%, and 98.90% which we consider very satisfactory performance. The best fine-tuned model was used for the classification stage which achieved an accuracy of 98.42%, recall of 99.47%, precision of 98.34% and F-Score of 98.90%.
Similar content being viewed by others
References
Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466
Benenson R, Mathias M, Tuytelaars T, Van Gool L (2013) Seeking the strongest rigid detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3666–3673
Caviar R (2020) CAVIAR Test case scenarios. https://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/. Accessed 12 May 2020
Chen M, Wei X, Yang Q, Li Q, Wang G, Yang MH (2017) Spatiotemporal gmm for background subtraction with superpixel hierarchy. IEEE Trans Pattern Anal Mach Intell 40(6):1518–1525
Chollet F (2017) Deep learning with python. Greenwich, CT: Manning Publications CO 1
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Comput Vis Pattern Recogn CVPR 2005 IEEE 1:886–893
Dee HM, Velastin SA (2008) How close are we to solving the problem of automated visual surveillance? Mach Vis Appl 19(5–6):329–343
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: 2009 in Proceedings of the British Machine Vision Conference, BMVC Press, pp 7–10
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Dollár P, Appel R, Kienzle W (2012) Crosstalk cascades for frame-rate pedestrian detection. In: European Conference on Computer Vision, Springer, pp 645–659
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Machine Intell 36(8):1532–1545
Elgammal A, Harwood D, Davis L (2000) Non-parametric model for background subtraction. In: European Conference on Computer Vision, Springer, pp 751–767
Felzenszwalb PF, Huttenlocher DP (2000a) Efficient matching of pictorial structures. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), IEEE, 2 :66–73
Felzenszwalb PF, Huttenlocher DP (2000b) Efficient matching of pictorial structures. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), IEEE, 2: 66–73
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Garcia-Garcia B, Bouwmans T, Silva AJR (2020) Background subtraction in real applications: challenges, current models and future directions. Comput Sci Rev 35:100204
Garcia-Martin A, Martinez JM (2010) Robust real time moving people detection in surveillance scenarios. In: 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, pp 241–247
Hampapur A, Brown L, Connell J, Pankanti S, Senior A, Tian Y (2003) Smart surveillance: applications, technologies and implications. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, IEEE, 2: 1133–1138
Haritaoglu I, Harwood D, Davis LS (2000) W/sup 4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hossen MK, Tuli SH (2016) A surveillance system based on motion detection and motion estimation using optical flow. 5th International Conference on Informatics. Electronics and Vision (ICIEV), IEEE, pp 646–651
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017a) Mobilenets: efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017b) Mobilenets: efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861
Javed S, Bouwmans T, Jung SK (2015) Depth extended online rpca with spatiotemporal constraints for robust background subtraction. In: 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV), IEEE, pp 1–6
Keras (2021) Keras applications. https://keras.io/api/applications/(2021). Accessed 02 September 2021
Khalifa AF, Badr E, Elmahdy HN (2019) A survey on human detection surveillance systems for raspberry pi. Image Vis Comput 85:1–13
Ko T, Soatto S, Estrin D (2010) War** background subtraction. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1331–1338
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Kurnianggoro L, Shahbaz A, Jo KH (2016) Dense optical flow in stabilized scenes for moving object detection from a moving camera. 2016 16th International Conference on Control. Automation and Systems (ICCAS), IEEE, pp 704–708
Levi K, Weiss Y (2004) Learning object detection from a small number of examples: the importance of good features. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2004., IEEE, 2:II–II
Li X, Xu C (2015) Moving object detection in dynamic scenes based on optical flow and superpixels. In: 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, pp 84–89
Liu T, Wang G (2009) A hierarchical approach for robust background subtraction based on two views. WRI Global Congr Intel Syst IEEE 4:325–329
Liu X, ** Z, Gao M (2012) A robust approach for multi-human detection and tracking. 2012 2nd International Conference on Consumer Electronics. Communications and Networks (CECNet), IEEE, pp 832–835
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Marin J, Vázquez D, López AM, Amores J, Leibe B (2013) Random forests of local experts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2592–2599
Mathias M, Benenson R, Timofte R, Van Gool L (2013) Handling occlusions with franken-classifiers. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1505–1512
Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349–361
Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8
Murali S, Girisha R (2009) Segmentation of motion objects from surveillance video sequences using temporal differencing combined with multiple correlation. In: Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, pp 472–477
Narayana M, Hanson A, Learned-Miller E (2013) Coherent motion segmentation in moving camera videos using optical flow orientations. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1577–1584
Nguyen DT, Li W, Ogunbona PO (2016) Human detection from images and videos: a survey. Pattern Recogn 51:148–175
Noman M, Yousaf MH, Velastin SA (2016) An optimized and fast scheme for real-time human detection using raspberry pi. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp 1–7, 10.1109/DICTA.2016.7797008
Olugboja A, Wang Z (2016) Detection of moving objects using foreground detector and improved morphological filter. In: 3rd International Conference on Information Science and Control Engineering (ICISCE), IEEE, pp 329–333
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1777–1784
Park D, Ramanan D, Fowlkes C (2010) Multiresolution models for object detection. In: European conference on computer vision, Springer, pp 241–254
Patel PB, Choksi VM, Jadhav S, Potdar M (2016) Smart motion detection system using raspberry pi. Int J Appl Inf Syst (IJAIS) , pp 2249–0868
Ren J, Jiang X, Yuan J (2013) Relaxed local ternary pattern for face recognition. In: IEEE International Conference on Image Processing, IEEE, pp 3680–3684
Ronfard R, Schmid C, Triggs B (2002) Learning to parse pictures of people. In: European Conference on Computer Vision, Springer, pp 700–714
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Sabzmeydani P, Mori G (2007) Detecting pedestrians by learning shapelet features. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms. Cambridge University Press, Cambridge
Simonyan K, Zisserman A (2014a) Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556
Simonyan K, Zisserman A (2014b) Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556
Suzuki S et al (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1):32–46
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015a) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015b) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Theodoridis S, Koutroumbas K (2009) Chapter 13 - clustering algorithms II: hierarchical algorithms. Academic Press, Boston, pp 653–700
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. CVPR 1(511–518):3
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Walk S, Majer N, Schindler K, Schiele B (2010) New features and insights for pedestrian detection. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1030–1037
Wang X, Han TX, Yan S (2009) An hog-lbp human detector with partial occlusion handling. In: IEEE 12th International Conference on Computer Vision, IEEE, pp 32–39
Wojek C, Schiele B (2008) A performance evaluation of single and multi-feature people detection. In: Joint Pattern Recognition Symposium, Springer, pp 82–91
Wojek C, Schiele B, Perona P (2009a) Pedestrian detection: a benchmark. in in computer vision and pattern recognition, 2009. cvpr 2009. In: IEEE Conference on Citeseer
Wojek C, Walk S, Schiele B (2009b) Multi-cue onboard pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 794–801
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), IEEE, 1:90–97
Wu B, Nevatia R (2008) Optimizing discrimination-efficiency tradeoff in integrating heterogeneous local features for object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8
Yan J, Zhang X, Lei Z, Liao S, Li SZ (2013) Robust multi-resolution pedestrian detection in traffic scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3033–3040
Zhang Y, Li G, **e X, Wang Z (2017) A new algorithm for fast and accurate moving object detection based on motion segmentation by clustering. In: Fifteenth IAPR International Conference on Machine Vision Applications (MVA), IEEE, pp 444–447
Zhang Y, Zhu D, Bi H, Zhang G, Leung H (2019) Scattering key-frame extraction for comprehensive videosar summarization: a spatiotemporal background subtraction perspective. IEEE Trans Instrum Meas 69(7):4768–4784
Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn Lett 27(7):773–780
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bouafia, Y., Guezouli, L. & Lakhlef, H. Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification. Iran J Sci Technol Trans Electr Eng 46, 971–988 (2022). https://doi.org/10.1007/s40998-022-00512-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40998-022-00512-6