Abstract
Human behavior is an essential component of social interaction and is of great significance to identify and analyze human behaviors in a variety of fields. Due to the rapid development of computer vision and machine learning technology, machine with intelligence has started replacing human beings to observe, perceive and analyze the explosive growth of image and video data. Computer vision and machine learning-based human behavior recognition is one of these tasks, which has become a particularly hot research topic in many different fields, such as intelligent monitoring, human–computer interaction, smart home, virtual reality, and medical diagnosis. In this study, we survey systematically the popular methods, algorithms, models and well-known action datasets in human behavior analysis in the past two decades. In addition, the advantages and disadvantages of the methods are discussed and propitious future research directions are also presented. The results of this survey reveal that paradigms of human behavior analysis is being shifted from traditional RGB to RGB-D, from deep learning to more intelligent and automated deep reinforcement learning, and from fixed camera devices to portable devices and channel state information (CSI), and paradigms based on automated deep reinforcement learning and portable devices and CSI would become some hot topics for future research on human behavior analysis.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig17_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig18_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig19_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00932-x/MediaObjects/42979_2021_932_Fig20_HTML.png)
Similar content being viewed by others
References
Aggarwal JK, Ryoo MS. Human activity analysis: a review. ACM Comput Surv. 2011;43(3):1–43.
Zhu H, Vial R, Lu S. Tornado: a spatio-temporal convolutional regression network for video action proposal. In: IEEE International Conference on Computer Vision. 2017.
Paul SN, Singh YJ. Survey on video analysis of human walking motion. Int J Signal Process Image Process Pattern Recognit. 2014;7:99–122.
Papadopoulos GT, Axenopoulos A, Daras P. Real-time skeleton-tracking-based human action recognition using kinect data. In: Proceedings of the international conference on multimedia modeling. Cham: Springer; 2014. p. 473–83.
Mao XD, Fan YW. Application of high-definition technology in city public safety video surveillance. Video Eng. 2010;34(04):103–5.
Zhang W, Li W. A deep reinforcement learning based human behavior prediction approach in smart home environments. In: 2019 International Conference on Robots and Intelligent System (ICRIS). 2019.
Zhang PF, He KZ, OuYang ZZ, Zhang JY. Multifunctional intelligent outdoor mobile robot testbed-THMR-V. Robot. 2002;24(02):97–101.
Presti LL, Cascia ML. 3D Skeleton-based human action classifification: a survey. Pattern Recogn. 2016;53:130–47.
Haritaoglu I, Harwood D, Davis LS. W4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell. 2000;22(8):809–30.
Moeslund TB, Hilton A, Krüger V. A survey of advances in vision-based human motion capture and analysis. IEEE Trans Med Imaging. 2006;104(2–3):90–126.
Collins RT, Lipton AJ, Fujiyoshi H, Kanade T. Algorithms for cooperative multisensor surveillance. Proc IEEE. 2001;89(10):1456–77.
Gemert JCV, Jain M, Gati E, Snoek CGM. APT: action localization proposals from dense trajectories. In: Proceedings of the British Machine Vision Conference, 2015, pp. 7–10.
Wren CR, Azarbayejani AJ, Darrell TJ, Pentland AP. Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell. 1997;19(7):780–5.
Arulampalam MS, Maskell S, Gordon N, Clapp T. A tutorial on particule filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process. 2002;50(174):v2.
Chen YP, Qiu WG. Review of human behavior recognition algorithms based on vision. Comput Appl Res. 2019;36(7):1–10.
Zhang Z, Tao DC. Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell. 2012;34(3):436–50.
Laptev I, Marszalek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
Herath S, Harandi M, Porikli F. Going deeper into actio recognition: a survey. Image Vis Comput. 2017;60:4–21.
Dawn DD, Shaikh SH. A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis Comput. 2016;32(3):289–306.
Laptev I. On space-time interest points. Int J Comput Vis. 2005;64(2–3):107–23.
Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72.
Hu JF, Wang XH, Zheng WS, Lai JH. Research progress and prospect of RGB-D behavior recognition. J Autom. 2019;45(5):829–40.
Pushpajit K, Praveen K, Javed I. Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recogn Lett. 2018;115:107–16.
Annalisa F, Antonio M, Dario M. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recogn Lett. 2020;131:293–9.
Scovanner P, Ali S, Shah M. A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on Multimedia. New York: ACM; 2007. p. 357–60.
Yilmaz A, Shah M. Actions sketch: a novel action representation. IEEE Comput Soc Conf Comput Vis Pattern Recogn (CVPR). 2005;1:984–9.
Klaser A, Marszalek M, Schmid C. A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Vision Conference, 2008, pp, 1–10.
Wang H, Klaser A, Schmid C, Liu CL. Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vision. 2013;103(1):60–79.
Wang J, **a L. Abnormal behavior detection in videos using deep learning. Clust Comput. 2018;22:9229–39.
Jaouedi N, Boujnah N, Bouhlel MS. A new hybrid deep learning model for human action recognition. J King Saud Univ Comput Inf Sci. 2020;32(4):447–53.
Jadhav N, Sugandhi R. Survey on human behavior recognition using affective computing. IEEE Glob Conf Wirel Comput Netw (GCWCN). 2018. https://doi.org/10.1109/GCWCN.2018.8668632.
Wang JD, Chen YQ, Hao SJ, Peng XH, Hu LS. Deep learning for sensor-based activity recognition: a survey. Pattern Recogn Lett. 2017;119:3–11.
Wang LM, **ong YJ, Wang Z, Qiao Y. Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision. Cham: Springer; 2016.
Peng X, Wang L, Wang X, Qiao Y. Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst. 2016;150:109–25.
Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS. A comprehensive survey of vision-based human action recognition methods. Sensors. 2019;19(5):1005.
Kong Y, Fu Y. Human action recognition and prediction: a survey. Comput Vis Pattern Recogn. 2018;1–20. ar**v:1806.11230.
Ramasamy Ramamurthy S, Roy N. Recent trends in machine learning for human activity recognition: a survey. Wiley Interdiscip Rev. 2018;8(4):e1254.
Fu M, Chen N, Huang Z, Ni K, Ma X. Human action recognition: a survey. Plant long non-coding RNAS. Cham: . Springer; 2019. p. 69–77.
Lara OD, Labrador MA. A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutor. 2013;15(3):1192–209.
Wang L, Liu R. Human activity recognition based on wearable sensor using hierarchical deep LSTM networks. Circuits Syst Signal Process. 2019;39:837–56.
Wang Z, Jiang K, Hou Y, Dou W, Zhang C, Huang Z, Guo Y. A Survey on human behavior recognition using channel state information. IEEE Access. 2019;7:155986.
Yousefi S, Narui H, Dayal S, Ermon S, Valaee S. A survey on behavior recognition using WiFi channel state information. IEEE Commun Mag. 2017;55(10):98–104.
Zhu HL, Zhu CS, Xu ZG. Research advances on human activity recognition datasets. Acta Autom Sin. 2018;44(6):978–1004.
Chaquet JM, Carmona EJ, Fernández CA. A survey of video datasets for human action and activity recognition. Comput Vis Image Underst. 2013;117(6):633–59.
Huang QQ, Zhou FY, Liu MZ. Survey of human action recognition algorithms based on video. Appl Res Comput. 2020;37(11):3213–9.
Jegham I, Khalifa AB, Alouani I, Mahjoub MA. Vision-based human action recognition: An overview and real world challenges. Forensic Sci Int. 2009;32:200901.
Harris C, Stephens MJ. A combined corner and edge detector. In: Proceeding of the 4th Alvey Vision Conference, 1988, pp. 147–51.
Willems G, Tuytelaars T, Vaaan GL. An efficient dense and scale-invariant spatio-temporal interest point detector. Computer vision. Cham: Springer; 2008. p. 650–63.
Hu Q, Qin L, Huang QM. Overview of human action recognition based on vision. Chin J Comput. 2013;36(12):2512–24.
Bobick AF, Davis JW. The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell. 2001;23(3):257–67.
Blank M, Gorelick L, Shechtman E, Irani M, Basri R. Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision, IEEE Xplore. 2005.
Sahoo SP, Srinivasu U, Ari S. 3D Features for human action recognition with semi-supervised learning. IET Image Proc. 2019;13(6):983–90.
Wang H, Schmid C. Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision, IEEE, 2014, pp. 3551–3558.
Yi Y, Zhang Z, Lin M. Realistic action recognition with salient foreground trajectories. Expert Syst Appl. 2017;75:44–55.
Abdul-Azim HA, Hemayed EE. Human action recognition using trajectory-based representation. Egypt Inf J. 2015;16(2):187–98.
Lucas BD, Kanade T. An iterative image registration technique with an application to stereo vision. Imaging. 1981;130:674–9.
Zhu XD. Research on semantic topci model based human abnormal behaviour recognition. **’an: **’an University of Electronic Science and technology; 2011.
Gruber A, Rosen-Zvi M, Weiss Y. Hidden topic Markov models. In: Proceedings of Artificial Intelligence and Statistics. 2007.
Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps. J Real-Time Image Proc. 2016;12(1):155–63.
Yang XD, Zhang CY, Tian YL. Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, ACM, 2012, pp. 1057.
Ij**a EP, Chalavadi KM. Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn. 2017;72:504–16.
Luo J, Wang W, Qi H. Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recogn Lett. 2014;50:139–48.
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems. Berlin: Springer; 2014. p. 568–76.
Wang LM, **ong YJ, Wang Z, Qiao Y. Towards good practices for very deep two-stream ConvNets. 2015;1–5. ar**v:1507.02159.
Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1933–41.
Zhao YX, Man KL, Smith J, Siddique K, Guan SU. Improved two-stream model for human action recognition. EURASIP J Image Video Process. 2020;1:1–9.
Zhang CC, He N. Human motion recognition based on key frame two-stream convolutional network. J Nan**g Univ Inf Sci Technol. 2019;11(06):716–21 (Natural Science Edition).
Feichtenhofer C, Pinz A, Wildes RP. Spatiotemporal residual networks for video action recognition. 2016;1–9. ar**v:1611.02155v1.
He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–8.
Li C, Zhong QY, **e D, Pu SL. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 786–92.
Ji SW, Xu W, Yang M, Yu K. 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):221–31.
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3d convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV), 2015, pp. 4489–97.
Tran D, Ray J, Shou Z, Chang SF, Paluri M. ConvNet architecture search for spatio temporal feature learning. 2017;1–12. ar**v:1708.05038.
Qiu ZF, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of IEEE International Conference on Computer Vision, 2014, pp. 553–4.
Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 6645–9.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A. Sequential deep learning for human action recognition. In: Proceedings of IEEE international workshop on human behavior understanding. Berlin: Springer; 2011. p. 29–39.
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K. Long–term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2625–34.
Zhang ZF, Lv ZM, Gan CQ, Zhu QY. Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions. Neurocomputing. 2020;410:304–16.
Liu J, Shahroudy A, Xu D, Wang G. Spatio-temporal LSTM with trust gates for 3D human action recognition. Lecture notes in computer science. Berlin: Springer; 2016. p. 816–33.
Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2015, pp. 1110−8.
Zhu H, Chen H, Brown R. A sequence-to-sequence model-based deep learning approach for recognizing activity of daily living for senior care. J Biomed Inform. 2018;84:148–58.
Guo L, Wang L, Liu J, Zhou W, Lu B. HuAc: human activity recognition using crowdsourced WIFI signals and skeleton data. Wirel Commun Mobile Comput. 2018. https://doi.org/10.1155/2018/6163475.
Wang F, Zhou SP, Panev S, Han JS, Huang D. Person-in- WiFi: Fine-grained Person Perception using WiFi. In: 2019 IEEE/CVF International Conference on Computer Vision, 2019, pp. 5451–60.
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Cham: Springer; 2015. p. 234–41.
Zhao M, Li T, Alsheikh MA, Tian Y, Zhao H, Torralba A, Katabi D. Through-wall human pose estimation using radio signals. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
Rahmani H, Bennamoun M. Learning action recognition model from depth and skeleton videos. In: IEEE International Conference on Computer Vision, 2017, pp. 5833–42.
Tang Y, Tian Y, Lu J, Li P, Zhou J. Deep progressive reinforcement learning for skeleton-based action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018, pp. c5323–32.
Xu W, Yu J, Miao Z, Wan L, Ji Q. Spatio-temporal deep Q-networks for human activity localization. In: IEEE Transactions on Circuits and Systems for Video Technology, 2019, pp. 1–1.
Wang G, Wang W, Wang J, Bu Y. Better deep visual attention with reinforcement learning in action recognition. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2017, pp. 1–4.
Acknowledgements
This work was supported by the Natural Science Foundation of Shandong Province, China (NO. ZR2020MF148).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yuan, M., Wei, S., Zhao, J. et al. A Systematic Survey on Human Behavior Recognition Methods. SN COMPUT. SCI. 3, 6 (2022). https://doi.org/10.1007/s42979-021-00932-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-021-00932-x