Log in

A survey on intelligent human action recognition techniques

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human Action Recognition is an essential research area in computer vision due to its automated nature of video monitoring. Human Action Recognition has several applications, including robotics, video monitoring, health care, elderly monitoring, crowd behavior and the detection of aberrant activity. This study seeks to offer the reader an up-to-date overview of intelligent human activity recognition literature and current advancements in this area. This work discusses the recent state-of-the-art research for activity recognition techniques and challenges associated with identifying human activity and discusses publicly available datasets. This work consists of an in-depth survey of numerous works published from 2010 to 2022 focusing on intelligent techniques. This article describes all steps of human action recognition along with their techniques. This study comes with the Datasets for Human Action Recognition, Handcrafted-Feature technique, Machine Learning (ML), Deep-Learning (DL), Hybrid Deep Learning and limitation of this area. This study offers a comparative analysis between ML and DL approaches to show their effectiveness in action recognition. This study examines some unexplored areas in human action recognition that can be unearthed to create a more resilient system in the presence of issues. Previous research has demonstrated that deep learning surpasses standard machine learning for recognizing human activities. This study also emphasizes the most pressing issues and research direction. All relevant datasets are described in detail. Furthermore, our opinions and suggestions for future research have been shared. Compared to past surveys, this study offers a more systematic description of Human Action Recognition methods regarding comparability, problems, and the most recent evaluation technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Data availability

N.A.

Code availability

N.A.

References

  1. Ke SR, Thuc HLU, Lee YJ, Hwang JN, Yoo JH, Choi KH (2013) A review on video-based human activity recognition. Computers 2(2): 88–131. MDPI AG. https://doi.org/10.3390/computers2020088

  2. Gupta N, Gupta SK, Pathak RK et al (2022) Human activity recognition in artificial intelligence framework: a narrative review. Artif Intell Rev 55:4755–4808. https://doi.org/10.1007/s10462-021-10116-x

    Article  Google Scholar 

  3. Laptev I, Lindeberg T (2004) Local descriptors for spatio-temporal recognition. In: International workshop on spatial coherence for visual motion analysis

    Google Scholar 

  4. Gorelick L, BlankM SE, Irani M, Basri R (2005) Actions as space-time shapes. In: The tenth IEEE international conference on computer vision (ICCV’05)

    Google Scholar 

  5. Rodriguez MD, Ahmed J, Shah M (2008) Action of MACH a spatio-temporal maximum average correlation height filter for action recognition. In: 26th IEEE conference on computer vision and pattern recognition, CVPR, pp 1–8

    Google Scholar 

  6. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (n.d.) HMDB: a large video database for human motion recognition. In: International conference on computer vision, Barcelona, pp 2556–2563. https://doi.org/10.1109/ICCV.2011.6126543

  7. Reddy KK, Shah M (2012) Recognizing 50 human action categories of web videos. Machine Vision and Applications Journal (MVAP)

  8. Soomro K, Zamir AR, Mubarak Shah (2012) UCF101: A dataset of 101 human action classes from videos in the wild, CRCV-TR-12-01

  9. Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3D exemplars. In: IEEE11th international conference on computer vision, Rio de Janeiro

    Google Scholar 

  10. Kwapisz JR, Weiss GM, Moore SA (2011) Activity recognition using cell phone accelerometers. ACM SIGKDD Explorations Newsl 12(2):74–82. https://doi.org/10.1145/1964897.1964918

    Article  Google Scholar 

  11. Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings of IEEE international conference on image processing. Canada

    Google Scholar 

  12. Heilbron FC, Escorcia V, Ghanem B, Niebles JC (n.d) ActivityNet: a large-scale video benchmark for human activity understanding. In: IEEE conference on computer vision and pattern recognition (CVPR), Boston M A

  13. Wang J, Nie X, **a Y, Wu Y, Zhu SC (2014) Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2649–2656. https://doi.org/10.1109/CVPR.2014.339

    Chapter  Google Scholar 

  14. Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) HOPC: histogram of oriented principal components of 3D pointclouds for action recognition. Lect Notes Comput Sci (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8690 LNCS(PART 2):742–757. https://doi.org/10.1007/978-3-319-10605-2_48/COVER

    Article  Google Scholar 

  15. Shahroudy A, Liu J, Ng T-T, Wang G (n.d.) NTU RGB+D: a large-scale dataset for 3D human activity analysis. In: IEEE conference on computer vision and pattern recognition (CVPR)

  16. Jalal A, Kamal S, Kim D (n.d.) A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors 14(7):11735–11759

  17. Liu J, Shahroudy A, Perez M, Wang G, Duan L-Y, Kot AC (n.d.) NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. In: IEEE transactions on pattern analysis and machine intelligence (TPAMI)

  18. Kay W et al (2017) The Kinetics Human Action Video Dataset. [Online]. Available: http://arxiv.org/abs/1705.06950

  19. Li A, Thotakuri M, Ross DA, Carreira J, Vostrikov A, Zisserman A (2020) The AVA-Kinetics Localized Human Actions Video Dataset, [Online]. Available: http://arxiv.org/abs/2005.00214

  20. Damen D, Doughty H, Farinella GM et al (2022) Rescaling egocentric vision: collection, pipeline and challenges for EPIC-KITCHENS-100. Int J Comput Vis 130:33–55. https://doi.org/10.1007/s11263-021-01531-2

    Article  Google Scholar 

  21. Carreira J, Noland E, Banki-Horvath A, Hillier C, Zisserman A (2018) A Short Note about Kinetics-600, [Online]. Available: http://arxiv.org/abs/1808.01340

  22. Carreira J, Noland E, Hillier C, Zisserman A (2019) A Short Note on the Kinetics-700 Human Action Dataset, [Online]. Available: http://arxiv.org/abs/1907.06987

  23. Monfort M et al (2018) Moments in Time Dataset: one million videos for event understanding, [Online]. Available: http://arxiv.org/abs/1801.03150

  24. Niebles JC, Wang H, Fei-Fei L (n.d.) Unsupervised learning of human action categories using spatio-temporal words. Int J Comput Vis 79:299–318

  25. Calderara S, Cucchiara R, Prati A (n.d.) Action signature: a novel holistic representation for action recognition. In: Proc. IEEE 5th international conference on advanced video and signal-based surveillance, pp 121–128

  26. Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422. https://doi.org/10.1109/TPAMI.2011.239

    Article  Google Scholar 

  27. Iosifidis A, Tefas A, Pitas I (2012) Neural representation and learning for multi-view human action recognition. In: The 2012 international joint conference on neural networks (IJCNN), Brisbane, pp 1–6. https://doi.org/10.1109/IJCNN.2012.6252675

  28. Lu Y et al (2012) A human action recognition method based on Tchebichef moment invariants and temporal templates. In: 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics, 2:76–79

    Chapter  Google Scholar 

  29. Ji X, Liu H (2010) Advances in view-invariant human motion analysis: a review. In: IEEE transactions on systems, man, and cybernetics, Part C (applications and reviews), 40(1):13–24. https://doi.org/10.1109/TSMCC.2009.2027608

    Chapter  Google Scholar 

  30. Estevam V, Pedrini H, Menotti D (2021) Zero-shot action recognition in videos: a survey. Neurocomputing 439:59–175. https://doi.org/10.1016/j.neucom.2021.01.036

    Article  Google Scholar 

  31. Pareek P, Thakkar A (n.d.) A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54:2259–2322

  32. Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon HJ (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn 108(107561):31–3203

    Google Scholar 

  33. Beddiar DR, Nini B, Sabokrou M et al (2020) Vision-based human activity recognition: a survey. Multimed Tools Appl 79:30509–30555. https://doi.org/10.1007/s11042-020-09004-3

    Article  Google Scholar 

  34. Zhang H-B, Zhang Y-X, Zhong B, Lei Q, Yang L, Du J-X, Chen D-S (2019) A comprehensive survey of vision-based human action recognition methods. Sensors 19:1005. https://doi.org/10.3390/s19051005

    Article  Google Scholar 

  35. Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image and Vision Computing 60:4–21. https://doi.org/10.1016/j.imavis.2017.01.010

    Article  Google Scholar 

  36. Singh PK, Kundu S, Adhikary T, Sarkar R, Bhattacharjee D (2021) Progress of Human Action Recognition Research in the Last Ten Years: A Comprehensive Survey. Arch Comput Methods Eng 29:4:2309–2349. https://doi.org/10.1007/S11831-021-09681-9

  37. Jobanputra H, Bavishi J, Doshi N (2019) Human activity recognition: a survey. Procedia Comput Sci 155:698–703. https://doi.org/10.1016/j.procs.2019.08.100

  38. Kong Y, Yun Raymond F (2018) Human action recognition and prediction: a survey. Int J Comput Vis 130:1366–1401

    Article  Google Scholar 

  39. Guangchun C, Yiwen W, Abdullah S, Kamesh N, Bill B (2015) Advances in human action recognition: A survey

  40. Vishwakarma S, Agrawal A (n.d.) A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29(10):983–1009

  41. Aggarwal JK, Ryoo MS (2011) Human activity analysis. ACM Computing Surveys (CSUR) 43:1–43

    Article  Google Scholar 

  42. Bobick AF, Davis JW (n.d.) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

  43. Sheikh Y, Sheikh M, Shah M (n.d.) Exploring the space of a human action. In: Tenth IEEE Int Conf on Computer Vision, pp 144–149

  44. Amor BB, Su J, Srivastava A (n.d.) Action recognition using rate-invariant analysis of skeletal shape trajectories. Trans Pattern Anal Mach Intell 38:1–13

  45. Wang H, Kläser A, Schmid C, Liu C (n.d.) Action recognition by dense trajectories. CVPR 3169–3176

  46. Laptev I, Lindeberg T (n.d.) Space-time interest points. In: Proc. 9th IEEE Int. Conf. On computer vision, pp 432–439

  47. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse Spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance

    Google Scholar 

  48. Bregonzio M, Gong S, **ang T (2009) Recognising action as clouds of space-time interest points. In: 2009 IEEE conference on computer vision and pattern recognition, Miami, pp 1948–1955. https://doi.org/10.1109/CVPR.2009.5206779

  49. Thi TH, Zhang J, Cheng L, Wang L, Satoh S (n.d.) Human action recognition and localization in video using structured learning of local space-time features. IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 204–211

  50. Sadek S, Al-Hamadi A, Michaelis B, Sayed U (n.d.) An action recognition scheme using fuzzy log-polar histogram and temporal self-similarity. EURASIP J Adv Signal Process

  51. Chaudhry R, Ravichandran A, Hager G, Vidal R (n.d.) Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE computer Soc. Conf. Computer. Vis. Pattern recognition work. CVPR work. IEEE, pp 1932–1939

  52. Yuan C, Li X, Hu W, Ling H, Maybank S (n.d.) 3D R transform on spatio-temporal interest points for action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 724–730

  53. Sahoo SP, Silambarasi R, Ari S (n.d.) Fusion of histogram-based features for human action recognition. In: 5th international conference on advanced computing & communication systems, pp 1012–1016

  54. Gupta S, Mazumdar S, Student M (2013) Sobel edge detection algorithm.

  55. Teoh SH, Ibrahim H (n.d) Median filtering frameworks for reducing impulse noise from grayscale digital images: a literature survey. Int J Future Comput Commun 1:323–326

  56. Le QV, Zou WY, Yeung SY, Ng AY (n.d.) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3361–3368

  57. Darrell T, Pentland A (n.d.) Space-time gestures. In: Proc. IEEE computer society Conf. On computer vision and pattern recognition, pp 335–340

  58. Jiang H, Drew MS, Li ZN (n.d.) Successive convex matching for action detection. In: IEEE computer society Conf. On computer vision and pattern recognition, pp 1646–1653

  59. Oliver NM, Rosario B, Pentland AP (n.d.) A Bayesian computer vision system for modelling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831–843

  60. Shi Q, Cheng L, Wang L, Smola A (n.d.) Human action segmentation and recognition using discriminative semi-Markov models. Int J Comput Vis 93:22–32

  61. Oliver N, Horvitz E, Garg A (n.d) Layered representations for human activity recognition. In: Proc. 4th IEEE Int. Conf. On multimodal interfaces, pp 3–8

  62. Zhang D, Gatica-Perez D, Bengio S, McCowan I (n.d.) Modelling individual and group actions in meetings with layered hmms. IEEE Trans Multimed 8(3):509–520

  63. Nguyen NT, Phung DQ, Venkatesh S, Bui H (n.d.) Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model, IEEE computer society Conf on computer vision and pattern recognition, pp 955–960

  64. Shi Y, Huang Y, Minnen D, Bobick A, Essa I (n.d.) Propagation networks for recognition of partially ordered sequential action. In: Proc. of IEEE computer society Conf. On computer vision and pattern recognition, pp 862–869

  65. Iosifidis A, Tefas A, Pitas I (n.d.) Action-based person identification using fuzzy representation and discriminant learning. IEEE Trans Inf Forensics Secur 7:530–542

  66. Xu W, Miao Z, Zhang X, Tian Y (n.d.) Learning a hierarchical spatio-temporal model for human activity recognition. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, New Orleans, pp 1607–1611

  67. Kitani KM, Sato Y, Sugimoto A (2007) Recovering the basic structure of human activities from a video-based symbol string. In: 2007 IEEE workshop on motion and video computing (WMVC'07), Austin, p 9. https://doi.org/10.1109/WMVC.2007.34

  68. Ivanov Y, Bobick A (n.d.) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22:852–872

  69. Moore D, Essa I (n.d.) Recognizing multitasked activities from video using stochastic context-free grammar. AAAI National Conference on Artificial Intelligence, pp 770–776

  70. Minnen D, Essa I, Starner T (n.d.) Expectation grammars: leveraging high-level expectations for activity recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 626–632

  71. Joo SW, Chellappa R (n.d.) Attribute grammar-based event recognition and anomaly detection. IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp 107–114

  72. Siskind JM (n.d.) Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J Artif Intell Res 15:31–90

  73. Gupta A, Srinivasan P, Shi J, Davis L (n.d.) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2012–2019

  74. Ijsselmuiden J, Stiefelhagen R (n.d.) Towards high-level human activity recognition through computer vision and temporal logic. In: The 33rd annual German conference on advances in artificial intelligence, pp 426–435

  75. Khare M, Jeon M (2022) Multi-resolution approach to human activity recognition in video sequence based on combination of complex wavelet transform, Local Binary Pattern and Zernike moment. Multimed Tools Appl 81(24):34863–34892. https://doi.org/10.1007/S11042-021-11828-6/FIGURES/10

    Article  Google Scholar 

  76. Li C, Huang Q, Li X, Wu Q (2021) Human action recognition based on multi-scale feature maps from depth video sequences. Multimed Tools Appl 80(21–23):32111–32130. https://doi.org/10.1007/S11042-021-11193-4/TABLES/8

    Article  Google Scholar 

  77. Ikizler N, Duygulu PD (n.d.) Histogram of oriented rectangles: a new pose descriptor for human action recognition. Image Vis Comput 27(10):1515–1526. https://doi.org/10.1016/j.imavis.2009.02.002

  78. Kellokumpu V, Zhao G, Pietikäinen M (n.d.) Recognition of human actions using texture descriptors. Mach Vis Appl 22:767–780

  79. Kliper-Gross O, Gurovich Y, Hassner T, Wolf L (n.d.) Motion interchange patterns for action recognition in unconstrained videos. In: European conference on computer vision. Springer, Berlin/Heidelberg, pp 256–269

  80. Jiang YG, Dai Q, Xue X, Liu W, Ngo CW (n.d.) Trajectory-based modeling of human actions with motion reference points. In: European conference on computer vision. Springer, Berlin/Heidelberg, pp 425–438

  81. Wang C, Wang Y, Yuille AL (n.d.) An approach to pose-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Portland, OR, USA, pp 915–922

  82. Zanfir M, Leordeanu M, Sminchisescu C (n.d.) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: Proceedings of the IEEE international conference on computer vision. Sydney, Australia, pp 2752–2759

  83. Chaaraoui AA, Climent-Pérez P, Flórez-Revuelta F (n.d.) Silhouette-based human action recognition using sequences of key poses. Pattern Recogn Lett 34:1799–1807

  84. Rahman SA, Song I, Leung MK, Lee I, Lee K (n.d.) Fast action recognition using negative space features. Expert Syst Appl 41:574–587

  85. Junejo IN, Junejo KN, Al Aghbari Z (n.d) Silhouette-based human action recognition using SAX-shapes. Vis Comput 30:259–269

  86. Vishwakarma DK, Kapoor R, Dhiman A (n.d.) A proposed unified framework for the recognition of human activity by exploiting the characteristics of action dynamics. Robot Auton Syst 77:25–38

  87. Jalal A, Kim YH, Kim YJ, Kamal S, Kim D (n.d.) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308

  88. Patrona F, Chatzitofis A, Zarpalas D, Daras P (2018) Motion analysis: action detection, recognition and evaluation based on motion capture data. Pattern Recogn 76:612–622

  89. Zhang C, Xu Y, Xu Z et al (2022) Hybrid handcrafted and learned feature framework for human action recognition. Appl Intell 52:12771–12787. https://doi.org/10.1007/s10489-021-03068-w

    Article  Google Scholar 

  90. Bengio Y (n.d) Learning deep architectures for AI. Found Trends Mach Learn 2:1–127

  91. Ji S, Xu W, Yang M, Yu K (n.d.) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

  92. Weimer D, Scholz-Reiter B, Shpitalni M (n.d.) Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann Manuf Technol 65(1):417–420

  93. Le QV (n.d.) Building high-level features using large scale unsupervised learning. In: 2013 IEEE Int. Conf. On acoustics, speech and signal processing (ICASSP)

  94. Huang Y, Lai S-H, Tai S-H (n.d.) Human action recognition based on temporal pose CNN and multidimensional fusion. In: Proceedings of the European conference on computer vision (ECCV)

  95. Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869. https://doi.org/10.1093/bib/bbw068

    Article  Google Scholar 

  96. Krizhevsky A, Sutskever I, Hinton GE (n.d.) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 1097–1105

  97. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (n.d.) Large-scale video classification with convolutional neural networks, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, pp 1725–1732

  98. Ravi D, Wong C, Lo B, Yang GZ (n.d.) Deep learning for human action recognition: a resource efficient implementation on low-power devices. In: BSN 2016—13th annual body sensor networks conference, pp 71–76

  99. Marjaneh S, Hassan F (2017) Single image action recognition by predicting space-time saliency

  100. Banerjee A, Singh PK, Sarkar R (n.d.) Fuzzy integral based CNN classifier fusion for 3D skeleton action recognition. IEEE Trans Circ Syst Video Technol 31(6):2206–2216

  101. Ng A (n.d.) Sparse autoencoder. CS294A Lect Note 72:1–19

  102. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (n.d.) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

  103. Hasan M, Roy-Chowdhury AK (n.d.) A continuous learning framework for activity recognition using deep hybrid feature models. IEEE Trans Multimed 17:11

  104. Wang X, Gao L, Song J, Zhen X, Sebe N, Shen HT (n.d.) Deep appearance and motion learning for egocentric activity recognition. Neurocomputing 275:438–447

  105. Gao X, Luo H, Wang Q, Zhao F, Ye L, Zhang Y (2019) A human activity recognition algorithm based on stacking Denoising autoencoder and LightGBM. Sensors. 19(4):947. https://doi.org/10.3390/s19040947

    Article  Google Scholar 

  106. Du Y, Wang W, Wang L (n.d.) Hierarchical recurrent neural network for skeleton-based action recognition. Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1110–1118

  107. Graves A (n.d.) Generating sequences with recurrent neural networks.

  108. Salehinejad H, Sankar S, Barfett J, Colak E, Valaee S (n.d.) Recent advances in recurrent neural networks.

  109. Qi M, Wang Y, Qin J, Li A, Luo J, Gool L (n.d.) stagNet: an attentive semantic RNN for group action and individual action recognition. IEEE Trans Circ Syst Video Technol 30:1

  110. Liu J, Shahroudy A, Xu D, Wang G (n.d.) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9907. LNCS, pp 816–833

  111. Cho K et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1724–1734

  112. Goodfellow I et al (n.d.) Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp 2672–2680

  113. Huang GB, Lee H, Learned-Miller E (n.d.) Learning hierarchical representations for face verifcation with convolutional deep belief networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’ 12), pp 2518–2525

  114. Radford A, Metz L, Chintala S (n.d.) Unsupervised representation learning with deep convolutional generative adversarial networks.

  115. Zadeh MZ, Babu AR, Jaiswal A, Makedon F (n.d.) Self-supervised human activity recognition by augmenting generative adversarial networks, p 11755

  116. Li R, Pan J, Li Z, Tang J (n.d.) Single image Dehazing via conditional generative adversarial network.

  117. Yang Y, Hou C, Lang Y, Guan D, Huang D, Xu J (n.d.) Open-set human activity recognition based on micro-Doppler signatures. Pattern Recogn 85:60–69

  118. Gammulle H, Denman S, Sridharan S, Fookes C (2019) Multi-level sequence GAN for group activity recognition. In: Jawahar C, Li H, Mori G, Schindler K (eds) Computer vision – ACCV 2018. Lecture notes in computer science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_21

  119. Ahsan U, Sun C, Essa I (n.d.) DiscrimNet: semi-supervised action recognition from videos using generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Women in computer vision (WiCV’17)

  120. Donahue J et al (n.d.) Long-term recurrent convolutional networks for visual recognition and description. CVPR

  121. Kar A, Rai N, Sikka K, Sharma G (n.d.) Adascan: adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. CVPR

  122. Jaouedi N, Boujnah N, Bouhlel MS (n.d.) A new hybrid deep learning model for human action recognition. J King Saud Univ - Comput Inf Sci 32

  123. Gowda SN (2017) Human activity recognition using combinatorial deep belief networks. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 1589–1594. https://doi.org/10.1109/CVPRW.2017.203

    Chapter  Google Scholar 

  124. Wu Z, Wang X, Jiang Y-G, Ye H, Xue X (2015) Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of the 23rd ACM international conference on multimedia (MM '15). Association for Computing Machinery, New York, pp 461–470. https://doi.org/10.1145/2733373.2806222

    Chapter  Google Scholar 

  125. Lv M, Xu W, Chen T (n.d.) A hybrid deep convolutional and recurrent neural network for complex activity recognition using multimodal sensors. Neurocomputing 362

  126. Ij**a EP, Mohan CK (n.d.) Hybrid deep neural network model for human action recognition. Appl. Soft Comput 46:936–952

  127. Al-Azzawi NA (n.d.) Human action recognition based on hybrid deep learning model and Shearlet transform. In: 2020 12th international conference on information technology and electrical engineering (ICITEE, Yogyakarta), pp 152–155

  128. Yadav SK, Tiwari K, Pandey HM, Akbar SA (2022) Skeleton-based human activity recognition using ConvLSTM and guided feature learning. Soft comput 26(2):877–890. https://doi.org/10.1007/S00500-021-06238-7/FIGURES/11

    Article  Google Scholar 

  129. Wensel J, Ullah H, Member S, Munir A, Member S (2022) ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos. Accessed: May 11, 2023. [Online]. Available: https://arxiv.org/abs/2208.07929v2

  130. Challa SK, Kumar A, Semwal VB (2022) A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Vis Comput 38(12):4095–4109. https://doi.org/10.1007/S00371-021-02283-3/TABLES/7

    Article  Google Scholar 

  131. Jiang N, Quan W, Geng Q, Shi Z, Xu P (2023) Exploiting 3D human recovery for action recognition with Spatio-temporal bifurcation fusion. In: ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5. https://doi.org/10.1109/ICASSP49357.2023.10096404

    Chapter  Google Scholar 

  132. Merlo E, Lagomarsino M, Lamon E, Ajoudani A (2023) Automatic Interaction and Activity Recognition from Videos of Human Manual Demonstrations with Application to Anomaly Detection. Accessed: May 12, 2023. [Online]. Available: http://arxiv.org/abs/2304.09789

  133. Usmani A, Siddiqui N, Islam S (2023) Skeleton joint trajectories based human activity recognition using deep RNN. Multimed Tools Applic 2023:1–25. https://doi.org/10.1007/S11042-023-15024-6

    Article  Google Scholar 

  134. Yin M, He S, Soomro TA, Yuan H (2023) Efficient skeleton-based action recognition via multi-stream depthwise separable convolutional neural network. Expert Syst Appl 226:120080. https://doi.org/10.1016/J.ESWA.2023.120080

    Article  Google Scholar 

  135. Barkoky A, Charkari NM (2022) Complex Network-based features extraction in RGB-D human action recognition. J Vis Commun Image Represent 82:103371. https://doi.org/10.1016/J.JVCIR.2021.103371

    Article  Google Scholar 

  136. Deng L (n.d.) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3:2

  137. Dosovitskiy A, Fischer P, Springenberg JT (n.d.) Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 38(9):1734–1747

  138. Núñez JC, Cabido R, Pantrigo JJ, Montemayor AS, Vélez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn 76

  139. Dobhal T, Shitole V, Thomas G, Navada G (n.d.) Human activity recognition using binary motion image and deep learning. Procedia Comput Sci 58:178–185

  140. Khelalef A, Ababsa F, Benoudjit N (2019) An efficient human activity recognition technique based on deep learning. Pattern Recognit Image Anal 29:702–715

  141. Si C, Chen W, Wang W, Wang L, Tan T (n.d.) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236

  142. Majd M, Safabakhsh R (2020) Correlational convolutional LSTM for human action recognition. Neurocomputing 396:224–229. https://doi.org/10.1016/j.neucom.2018.10.095

    Article  Google Scholar 

  143. Dai C, Liu X, Lai J (n.d.) Human action recognition using two-stream attention-based LSTM networks. Appl Soft Comput

  144. Simonyan K, Zisserman A (n.d.) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp. 568–576

  145. Ullah A, Muhammad K, Ser JD, Baik SW, Albuquerque VHC (n.d.) Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM. IEEE Trans Ind Electr 66(12):9692–9702

  146. Hinton GE, Osindero S, Teh Y-W (n.d.) A fast-learning algorithm for deep belief nets. Neural Comput 18:1527–1554

  147. Uddin MZ (n.d.) Facial expression recognition utilizing local direction-based robust features and deep belief network. IEEE Access 5:4525–4536

  148. Sheeba PT, SSM, Rani SD (n.d.) Fuzzy Based Deep Belief Network for Activity Recognition. In: Proceedings of International Conference on Recent Trends in Computing, Communication & Networking Technologies (ICRTCCNT)

  149. Lee H, Grosse R, Ranganath R, Ng AY (n.d.) Unsupervised learning of hierarchical representations with convolutional deep belief networks. Commun ACM 54(10):95–103

  150. Li X et al (n.d.) Region-based Activity Recognition Using Conditional GAN. In: Proceedings of the 25th ACM international conference on Multimedia Association for Computing Machinery, New York, NY, USA, pp. 1059–1067

  151. Savadi Hosseini M, Ghaderi F (n.d.) A Hybrid Deep Learning Architecture Using 3D CNNs and GRUs for Human Action Recognition. Int J Eng 33(5):959–965

  152. Wang L, Qiao Y (n.d.) Tang X Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4305–4314

  153. Ullah A, Muhammad K, Haq IU (n.d.) Baik SW Action recognition using optimized deep Autoencoder and CNN for surveillance data streams of non-stationary environments. Future Gener Comput 96:386–397

  154. Shi Y, Tian Y, Wang Y, Huang T (n.d.) Sequential deep trajectory descriptor for action recognition with three-stream cnn. IEEE Trans Multimed 19(7):1510–1520

  155. Liu M, Liu H, Chen C (n.d.) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362

  156. Li C, Wang P, Wang S, Hou Y, Li W (n.d.) Skeleton-based action recognition using LSTM and CNN. In: IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp. 585–590

  157. Das S, Chaudhary A, Bremond F, Thonnat M (n.d.) Where to focus on for human action recognition? In: IEEE winter conference on applications of computer vision (WACV). IEEE, pp. 71–80

  158. Ij**a EP, Chalavadi KM (n.d.) Human action recognition in RGB-D videos using motion sequence information and deep learning. Recognition 72:pp. 31–3203, 504–516

  159. Verma P, Sah A, Srivastava R (n.d.) Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition. Multimed Syst 26:671–685

  160. Tanberk S, Kilimci ZH, Tükel DB, Uysal M, Akyokuş S (n.d.) A Hybrid Deep Model Using Deep Learning and Dense Optical Flow Approaches for Human Activity Recognition. IEEE Access 8:19799–19809

  161. Singh T, Vishwakarma DK (n.d.) A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Applic 33:469–485

  162. Mukherjee D, Mondal R, Singh PK (n.d.) EnsemConvNet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications. Multimed Tools Appl 79:31663–31690

  163. Tasnim N, Islam MK, Baek J-H (2021) Deep Learning Based Human Activity Recognition Using Spatio-Temporal Image Formation of Skeleton Joints. Appl Sci 11(6):2675

    Article  Google Scholar 

  164. Bilal M, Maqsood M, Yasmin S (n.d.) A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlap** action classes. J Supercomput 78:2873–2908

  165. Muhammad K et al (n.d.) Human action recognition using attention-based LSTM network with dilated CNN features. Future Gener Comput Syst 125:820–830, pp. 167–739

  166. Andrade-Ambriz YA, Ledesma S, Ibarra-Manzano M-A, Oros-Flores MI, Almanza-Ojeda D-L (2022) Human activity recognition using temporal convolutional neural network architecture. Expert Syst Appl 191:116287

  167. Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:107102. https://doi.org/10.1016/J.ASOC.2021.107102

    Article  Google Scholar 

  168. Yadav SK, Luthra A, Tiwari K, Pandey HM, Akbar SA (2022) ARFDNet: An efficient activity recognition & fall detection system using latent feature pooling. Knowl Based Syst 239:107948. https://doi.org/10.1016/J.KNOSYS.2021.107948

    Article  Google Scholar 

  169. Basak H, Kundu R, Singh PK, Ijaz MF, Woźniak M, Sarkar R (2022) A union of deep learning and swarm-based optimization for 3D human action recognition. Sci Rep 12(1). https://doi.org/10.1038/s41598-022-09293-8

  170. Putra PU, Shima K, Shimatani K (n.d.) A deep neural network model for multi-view human activity recognition. PLoS One 17(1):262181

  171. Sánchez-Caballero A et al (2022) 3DFCNN: real-time action recognition using 3D deep neural networks with raw depth information. Multimed Tools Appl 81(17):24119–24143. https://doi.org/10.1007/S11042-022-12091-Z/TABLES/7

    Article  Google Scholar 

  172. Nasir IM, Raza M, Ulyah SM, Shah JH, Fitriyani NL, Syafrudin M (2023) ENGA: Elastic Net-Based Genetic Algorithm for human action recognition. Expert Syst Appl 227:120311. https://doi.org/10.1016/J.ESWA.2023.120311

    Article  Google Scholar 

  173. Nikpour B, Armanfard N (2023) Spatio-temporal hard attention learning for skeleton-based activity recognition. Pattern Recognit 139:109428. https://doi.org/10.1016/J.PATCOG.2023.109428

    Article  Google Scholar 

  174. Al-Faris M, Chiverton J, Ndzi D, Ahmed AI (n.d.) A Review on Computer Vision-Based Methods for Human Action Recognition. J Imaging 10;6(6):46

Download references

Funding

NA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Kumar.

Ethics declarations

Ethics approval

N.A.

Consent to participate

N.A.

Consent for publication

NA.

Conflicts of interest/Competing interests

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, R., Kumar, S. A survey on intelligent human action recognition techniques. Multimed Tools Appl 83, 52653–52709 (2024). https://doi.org/10.1007/s11042-023-17529-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17529-6

Keywords

Navigation