Log in

A Systematic Survey on Human Behavior Recognition Methods

  • Survey Article
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Human behavior is an essential component of social interaction and is of great significance to identify and analyze human behaviors in a variety of fields. Due to the rapid development of computer vision and machine learning technology, machine with intelligence has started replacing human beings to observe, perceive and analyze the explosive growth of image and video data. Computer vision and machine learning-based human behavior recognition is one of these tasks, which has become a particularly hot research topic in many different fields, such as intelligent monitoring, human–computer interaction, smart home, virtual reality, and medical diagnosis. In this study, we survey systematically the popular methods, algorithms, models and well-known action datasets in human behavior analysis in the past two decades. In addition, the advantages and disadvantages of the methods are discussed and propitious future research directions are also presented. The results of this survey reveal that paradigms of human behavior analysis is being shifted from traditional RGB to RGB-D, from deep learning to more intelligent and automated deep reinforcement learning, and from fixed camera devices to portable devices and channel state information (CSI), and paradigms based on automated deep reinforcement learning and portable devices and CSI would become some hot topics for future research on human behavior analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig.19
Fig. 20

Similar content being viewed by others

References

  1. Aggarwal JK, Ryoo MS. Human activity analysis: a review. ACM Comput Surv. 2011;43(3):1–43.

    Google Scholar 

  2. Zhu H, Vial R, Lu S. Tornado: a spatio-temporal convolutional regression network for video action proposal. In: IEEE International Conference on Computer Vision. 2017.

  3. Paul SN, Singh YJ. Survey on video analysis of human walking motion. Int J Signal Process Image Process Pattern Recognit. 2014;7:99–122.

    Google Scholar 

  4. Papadopoulos GT, Axenopoulos A, Daras P. Real-time skeleton-tracking-based human action recognition using kinect data. In: Proceedings of the international conference on multimedia modeling. Cham: Springer; 2014. p. 473–83.

    Google Scholar 

  5. Mao XD, Fan YW. Application of high-definition technology in city public safety video surveillance. Video Eng. 2010;34(04):103–5.

    Google Scholar 

  6. Zhang W, Li W. A deep reinforcement learning based human behavior prediction approach in smart home environments. In: 2019 International Conference on Robots and Intelligent System (ICRIS). 2019.

  7. Zhang PF, He KZ, OuYang ZZ, Zhang JY. Multifunctional intelligent outdoor mobile robot testbed-THMR-V. Robot. 2002;24(02):97–101.

    Google Scholar 

  8. Presti LL, Cascia ML. 3D Skeleton-based human action classifification: a survey. Pattern Recogn. 2016;53:130–47.

    Google Scholar 

  9. Haritaoglu I, Harwood D, Davis LS. W4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell. 2000;22(8):809–30.

    Google Scholar 

  10. Moeslund TB, Hilton A, Krüger V. A survey of advances in vision-based human motion capture and analysis. IEEE Trans Med Imaging. 2006;104(2–3):90–126.

    Google Scholar 

  11. Collins RT, Lipton AJ, Fujiyoshi H, Kanade T. Algorithms for cooperative multisensor surveillance. Proc IEEE. 2001;89(10):1456–77.

    Google Scholar 

  12. Gemert JCV, Jain M, Gati E, Snoek CGM. APT: action localization proposals from dense trajectories. In: Proceedings of the British Machine Vision Conference, 2015, pp. 7–10.

  13. Wren CR, Azarbayejani AJ, Darrell TJ, Pentland AP. Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell. 1997;19(7):780–5.

    Google Scholar 

  14. Arulampalam MS, Maskell S, Gordon N, Clapp T. A tutorial on particule filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process. 2002;50(174):v2.

    Google Scholar 

  15. Chen YP, Qiu WG. Review of human behavior recognition algorithms based on vision. Comput Appl Res. 2019;36(7):1–10.

    Google Scholar 

  16. Zhang Z, Tao DC. Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell. 2012;34(3):436–50.

    Google Scholar 

  17. Laptev I, Marszalek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

  18. Herath S, Harandi M, Porikli F. Going deeper into actio recognition: a survey. Image Vis Comput. 2017;60:4–21.

    Google Scholar 

  19. Dawn DD, Shaikh SH. A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis Comput. 2016;32(3):289–306.

    Google Scholar 

  20. Laptev I. On space-time interest points. Int J Comput Vis. 2005;64(2–3):107–23.

    Google Scholar 

  21. Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72.

  22. Hu JF, Wang XH, Zheng WS, Lai JH. Research progress and prospect of RGB-D behavior recognition. J Autom. 2019;45(5):829–40.

    Google Scholar 

  23. Pushpajit K, Praveen K, Javed I. Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recogn Lett. 2018;115:107–16.

    Google Scholar 

  24. Annalisa F, Antonio M, Dario M. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recogn Lett. 2020;131:293–9.

    Google Scholar 

  25. Scovanner P, Ali S, Shah M. A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on Multimedia. New York: ACM; 2007. p. 357–60.

    Google Scholar 

  26. Yilmaz A, Shah M. Actions sketch: a novel action representation. IEEE Comput Soc Conf Comput Vis Pattern Recogn (CVPR). 2005;1:984–9.

    Google Scholar 

  27. Klaser A, Marszalek M, Schmid C. A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Vision Conference, 2008, pp, 1–10.

  28. Wang H, Klaser A, Schmid C, Liu CL. Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vision. 2013;103(1):60–79.

    MathSciNet  Google Scholar 

  29. Wang J, **a L. Abnormal behavior detection in videos using deep learning. Clust Comput. 2018;22:9229–39.

    Google Scholar 

  30. Jaouedi N, Boujnah N, Bouhlel MS. A new hybrid deep learning model for human action recognition. J King Saud Univ Comput Inf Sci. 2020;32(4):447–53.

    Google Scholar 

  31. Jadhav N, Sugandhi R. Survey on human behavior recognition using affective computing. IEEE Glob Conf Wirel Comput Netw (GCWCN). 2018. https://doi.org/10.1109/GCWCN.2018.8668632.

    Article  Google Scholar 

  32. Wang JD, Chen YQ, Hao SJ, Peng XH, Hu LS. Deep learning for sensor-based activity recognition: a survey. Pattern Recogn Lett. 2017;119:3–11.

    Google Scholar 

  33. Wang LM, **ong YJ, Wang Z, Qiao Y. Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision. Cham: Springer; 2016.

    Google Scholar 

  34. Peng X, Wang L, Wang X, Qiao Y. Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst. 2016;150:109–25.

    Google Scholar 

  35. Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS. A comprehensive survey of vision-based human action recognition methods. Sensors. 2019;19(5):1005.

    Google Scholar 

  36. Kong Y, Fu Y. Human action recognition and prediction: a survey. Comput Vis Pattern Recogn. 2018;1–20. ar**v:1806.11230.

  37. Ramasamy Ramamurthy S, Roy N. Recent trends in machine learning for human activity recognition: a survey. Wiley Interdiscip Rev. 2018;8(4):e1254.

    Google Scholar 

  38. Fu M, Chen N, Huang Z, Ni K, Ma X. Human action recognition: a survey. Plant long non-coding RNAS. Cham: . Springer; 2019. p. 69–77.

    Google Scholar 

  39. Lara OD, Labrador MA. A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutor. 2013;15(3):1192–209.

    Google Scholar 

  40. Wang L, Liu R. Human activity recognition based on wearable sensor using hierarchical deep LSTM networks. Circuits Syst Signal Process. 2019;39:837–56.

    Google Scholar 

  41. Wang Z, Jiang K, Hou Y, Dou W, Zhang C, Huang Z, Guo Y. A Survey on human behavior recognition using channel state information. IEEE Access. 2019;7:155986.

    Google Scholar 

  42. Yousefi S, Narui H, Dayal S, Ermon S, Valaee S. A survey on behavior recognition using WiFi channel state information. IEEE Commun Mag. 2017;55(10):98–104.

    Google Scholar 

  43. Zhu HL, Zhu CS, Xu ZG. Research advances on human activity recognition datasets. Acta Autom Sin. 2018;44(6):978–1004.

    Google Scholar 

  44. Chaquet JM, Carmona EJ, Fernández CA. A survey of video datasets for human action and activity recognition. Comput Vis Image Underst. 2013;117(6):633–59.

    Google Scholar 

  45. Huang QQ, Zhou FY, Liu MZ. Survey of human action recognition algorithms based on video. Appl Res Comput. 2020;37(11):3213–9.

    Google Scholar 

  46. Jegham I, Khalifa AB, Alouani I, Mahjoub MA. Vision-based human action recognition: An overview and real world challenges. Forensic Sci Int. 2009;32:200901.

    Google Scholar 

  47. Harris C, Stephens MJ. A combined corner and edge detector. In: Proceeding of the 4th Alvey Vision Conference, 1988, pp. 147–51.

  48. Willems G, Tuytelaars T, Vaaan GL. An efficient dense and scale-invariant spatio-temporal interest point detector. Computer vision. Cham: Springer; 2008. p. 650–63.

    Google Scholar 

  49. Hu Q, Qin L, Huang QM. Overview of human action recognition based on vision. Chin J Comput. 2013;36(12):2512–24.

    Google Scholar 

  50. Bobick AF, Davis JW. The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell. 2001;23(3):257–67.

    Google Scholar 

  51. Blank M, Gorelick L, Shechtman E, Irani M, Basri R. Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision, IEEE Xplore. 2005.

  52. Sahoo SP, Srinivasu U, Ari S. 3D Features for human action recognition with semi-supervised learning. IET Image Proc. 2019;13(6):983–90.

    Google Scholar 

  53. Wang H, Schmid C. Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision, IEEE, 2014, pp. 3551–3558.

  54. Yi Y, Zhang Z, Lin M. Realistic action recognition with salient foreground trajectories. Expert Syst Appl. 2017;75:44–55.

    Google Scholar 

  55. Abdul-Azim HA, Hemayed EE. Human action recognition using trajectory-based representation. Egypt Inf J. 2015;16(2):187–98.

    Google Scholar 

  56. Lucas BD, Kanade T. An iterative image registration technique with an application to stereo vision. Imaging. 1981;130:674–9.

    Google Scholar 

  57. Zhu XD. Research on semantic topci model based human abnormal behaviour recognition. **’an: **’an University of Electronic Science and technology; 2011.

    Google Scholar 

  58. Gruber A, Rosen-Zvi M, Weiss Y. Hidden topic Markov models. In: Proceedings of Artificial Intelligence and Statistics. 2007.

  59. Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps. J Real-Time Image Proc. 2016;12(1):155–63.

    Google Scholar 

  60. Yang XD, Zhang CY, Tian YL. Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, ACM, 2012, pp. 1057.

  61. Ij**a EP, Chalavadi KM. Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn. 2017;72:504–16.

    Google Scholar 

  62. Luo J, Wang W, Qi H. Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recogn Lett. 2014;50:139–48.

    Google Scholar 

  63. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems. Berlin: Springer; 2014. p. 568–76.

    Google Scholar 

  64. Wang LM, **ong YJ, Wang Z, Qiao Y. Towards good practices for very deep two-stream ConvNets. 2015;1–5. ar**v:1507.02159.

  65. Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1933–41.

  66. Zhao YX, Man KL, Smith J, Siddique K, Guan SU. Improved two-stream model for human action recognition. EURASIP J Image Video Process. 2020;1:1–9.

    Google Scholar 

  67. Zhang CC, He N. Human motion recognition based on key frame two-stream convolutional network. J Nan**g Univ Inf Sci Technol. 2019;11(06):716–21 (Natural Science Edition).

    Google Scholar 

  68. Feichtenhofer C, Pinz A, Wildes RP. Spatiotemporal residual networks for video action recognition. 2016;1–9. ar**v:1611.02155v1.

  69. He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–8.

  70. Li C, Zhong QY, **e D, Pu SL. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 786–92.

  71. Ji SW, Xu W, Yang M, Yu K. 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):221–31.

    Google Scholar 

  72. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3d convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV), 2015, pp. 4489–97.

  73. Tran D, Ray J, Shou Z, Chang SF, Paluri M. ConvNet architecture search for spatio temporal feature learning. 2017;1–12. ar**v:1708.05038.

  74. Qiu ZF, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of IEEE International Conference on Computer Vision, 2014, pp. 553–4.

  75. Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 6645–9.

  76. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

    Google Scholar 

  77. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A. Sequential deep learning for human action recognition. In: Proceedings of IEEE international workshop on human behavior understanding. Berlin: Springer; 2011. p. 29–39.

    Google Scholar 

  78. Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K. Long–term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2625–34.

  79. Zhang ZF, Lv ZM, Gan CQ, Zhu QY. Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions. Neurocomputing. 2020;410:304–16.

    Google Scholar 

  80. Liu J, Shahroudy A, Xu D, Wang G. Spatio-temporal LSTM with trust gates for 3D human action recognition. Lecture notes in computer science. Berlin: Springer; 2016. p. 816–33.

    Google Scholar 

  81. Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2015, pp. 1110−8.

  82. Zhu H, Chen H, Brown R. A sequence-to-sequence model-based deep learning approach for recognizing activity of daily living for senior care. J Biomed Inform. 2018;84:148–58.

    Google Scholar 

  83. Guo L, Wang L, Liu J, Zhou W, Lu B. HuAc: human activity recognition using crowdsourced WIFI signals and skeleton data. Wirel Commun Mobile Comput. 2018. https://doi.org/10.1155/2018/6163475.

    Article  Google Scholar 

  84. Wang F, Zhou SP, Panev S, Han JS, Huang D. Person-in- WiFi: Fine-grained Person Perception using WiFi. In: 2019 IEEE/CVF International Conference on Computer Vision, 2019, pp. 5451–60.

  85. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Cham: Springer; 2015. p. 234–41.

    Google Scholar 

  86. Zhao M, Li T, Alsheikh MA, Tian Y, Zhao H, Torralba A, Katabi D. Through-wall human pose estimation using radio signals. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.

  87. Rahmani H, Bennamoun M. Learning action recognition model from depth and skeleton videos. In: IEEE International Conference on Computer Vision, 2017, pp. 5833–42.

  88. Tang Y, Tian Y, Lu J, Li P, Zhou J. Deep progressive reinforcement learning for skeleton-based action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018, pp. c5323–32.

  89. Xu W, Yu J, Miao Z, Wan L, Ji Q. Spatio-temporal deep Q-networks for human activity localization. In: IEEE Transactions on Circuits and Systems for Video Technology, 2019, pp. 1–1.

  90. Wang G, Wang W, Wang J, Bu Y. Better deep visual attention with reinforcement learning in action recognition. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2017, pp. 1–4.

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Shandong Province, China (NO. ZR2020MF148).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shouke Wei.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, M., Wei, S., Zhao, J. et al. A Systematic Survey on Human Behavior Recognition Methods. SN COMPUT. SCI. 3, 6 (2022). https://doi.org/10.1007/s42979-021-00932-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00932-x

Keywords

Navigation