A Comparative Study of Different Feature Descriptors for Video-Based Human Action Recognition

Sadhukhan, Swarnava; Mallick, Siddhartha; Singh, Pawan Kumar; Sarkar, Ram; Bhattacharjee, Debotosh

doi:10.1007/978-981-15-4288-6_3

Swarnava Sadhukhan¹⁶,
Siddhartha Mallick¹⁶,
Pawan Kumar Singh¹⁷,
Ram Sarkar¹⁸ &
…
Debotosh Bhattacharjee¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1157))

707 Accesses
3 Citations

Abstract

Human action recognition (HAR) has been a well-studied research topic in the field of computer vision since the past two decades. The objective of HAR is to detect and recognize actions performed by one or more persons based on a series of observations. In this paper, a comparative study of different feature descriptors applied for HAR on video datasets is presented. In particular, we estimate four standard feature descriptors, namely histogram of oriented gradients (HOG), gray-level co-occurrence matrix (GLCM), speeded-up robust features (SURF), and Graphics and Intelligence-based Scripting Technology (GIST) descriptors from RGB videos, after performing background subtraction and creating a minimum bounding box surrounding the human object. To further speed up the overall process, we apply an efficient sparse filtering method, which reduces the number of features by eliminating the redundant features and assigning weights to the features left after elimination. Finally, the performance of the said feature descriptors on three standard benchmark video datasets namely, KTH, HMDB51, and UCF11 has been analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Action recognition in depth videos using hierarchical gaussian descriptor

Article 13 January 2018

A Robust Framework for the Recognition of Human Action and Activity Using Spatial Distribution Gradients and Gabor Wavelet

Motion of Oriented Magnitudes Patterns for Human Action Recognition

References

Jaimes, A., & Sebe, N. (2005). Multimodal human computer interaction: A survey. In International workshop on human-computer interaction (pp. 1–15). Berlin, Heidelberg: Springer.
Google Scholar
Osmani, V., Balasubramaniam, S., & Botvich, D. (2008). Human activity recognition in pervasive health-care : Supporting efficient remote collaboration. Journal of Network and Computer Applications, 31, 628–655. https://doi.org/10.1016/j.jnca.2007.11.002
Jain, A. K., & Li, S. Z. (2011). Handbook of face recognition (Vol. 1). New York: springer.
Google Scholar
Du, Y., Wang, W., & Wang, L. (2015). Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1110–1118.
Google Scholar
Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
Google Scholar
Gammulle, H., Denman, S., Sridharan, S., & Fookes, C. (2017, March). Two stream lstm: A deep fusion framework for human action recognition. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 177–186) IEEE. https://doi.org/10.1109/WACV.2017.27
Sharif, M., Attique, M., Farooq, K., Jamal, Z., Shah, H., & Akram, T. (2019). Human action recognition : A framework of statistical weighted segmentation and rank correlation-based selection. Pattern Analysis and Applications, (0123456789). https://doi.org/10.1007/s10044-019-00789-0
Ullah, A., Muhammad, K., Ul, I., & Wook, S. (2019). Action recognition using optimized deep auto encoder and CNN for surveillance data streams of non-stationary environments. Future Generation Computer Systems, 96, 386–397. https://doi.org/10.1016/j.future.2019.01.029.
Article Google Scholar
Wang, T., Duan, P., & Ma, B. (2019). Action recognition using dynamic hierarchical trees. Journal of Visual Communication and Image Representation, 61, 315–325.
Article Google Scholar
Singh, R., Kushwaha, A. K. S., & Srivastava, R. (2019). Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimedia Tools and Applications, 78(12), 17165–17196.
Google Scholar
Sahoo, S. P., Silambarasi, R., & Ari, S. (2019, March). Fusion of histogram based features for Human Action Recognition. In 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) (pp. 1012–1016). IEEE.
Google Scholar
Chakraborty, B., Holte, M. B., Moeslund, T. B., Gonz, J., & Roca, F. X. (2011). A Selective Spatio-Temporal Interest Point Detector for Human Action Recognition in Complex Scenes. (pp. 1776–1783).
Google Scholar
Li, B., Ayazoglu, M., Mao, T., Camps, O. I., & Sznaier, M. (2011). Activity Recognition using Dynamic Subspace Angles. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2011.5995672
Rao, A. S., Gubbi, J., Rajasegarar, S., Marusic, S., & Palaniswami, M. (2014). Detection of anomalous crowd behaviour using hyperspherical clustering. In 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA). https://doi.org/10.1109/DICTA.2014.7008100.
Mahadevan, V., Li, W., Bhalodia, V., & Vasconcelos, N. (2010). Anomaly detection in crowded scenes. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE). https://doi.org/10.1109/CVPR.2010.5539872
Yu, T., Kim, T., & Cipolla, R. (2010). Real-time action recognition by spatiotemporal semantic and structural forests. British Machine Vision Conference, BMVC, 52(1-52), 12. https://doi.org/10.5244/C.24.52.
Article Google Scholar
Shabani, A. H., Clausi, D. A., & Zelek, J. S. (2011). Improved spatio-temporal salient feature detection for action recognition. British Machine Vision Conference, 1, 1–12.
Google Scholar
Zhen, X., & Shao, L. (2016). Action recognition via spatio-temporal local features: A comprehensive study. Image and Vision Computing, 50, 1–13. https://doi.org/10.1016/J.IMAVIS.2016.02.006.
Article Google Scholar
Sharif, M., Khan, M. A., Akram, T., Javed, M. Y., Saba, T., & Rehman, A. (2017). A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. Eurasip Journal on Image and Video Processing, 2017(1). https://doi.org/10.1186/s13640-017-0236-8
Wu, J., Hu, D., & Chen, F. (2014). Action recognition by hidden temporal models. The Visual Computer, 30(12), 1395–1404. https://doi.org/10.1007/s00371-013-0899-9
Elgammal, A., Duraiswami, R., Harwood, D., & Davis, L. S. (2002). Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of the IEEE, 90(7), 1151–1163.
Article Google Scholar
Otsu, N. (1979). A Threshold selection method from Gray-Level Histograms. IEEE transactions on systems, Man Cybernetics, 9(1), 62–66. Retrieved from http://web-ext.u-aizu.ac.jp/course/bmclass/documents/otsu1979.pdf.
Dalal, N., & Triggs, B. (2010). Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (pp. 886–893).
Google Scholar
Ngiam, J., Chen, Z., Bhaskar, S. A., Koh, P. W., & Ng, A. Y. (2011). Sparse filtering. In Advances in neural information processing systems (pp. 1125–1133). Retrieved from https://papers.nips.cc/paper/4334-sparse-filtering.pdf
Bay, H., Tuytelaars, T., & Gool, L. Van. (2008). SURF : Speeded up robust features. Computer Vision and Image Understanding, 110(3), 346–359. Retrieved from http://www.cescg.org/CESCG-2013/papers/Jakab-Planar_Object_Recognition_Using_Local_Descriptor_Based_On_Histogram_Of_Intensity_Patches.pdf
Haralick, R. M., Shanmugam, K., & Dinstein, I. H. (1973). Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3(6), 610–62
Google Scholar
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175. https://doi.org/10.1023/A:1011139631724.
Article MATH Google Scholar
Christian, S., Barbara, C., & Ivan, L. (2004). Recognizing Human Actions : A Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004 (pp. 3–7).
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB : A large video database for human motion recognition. In 2011 International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2011.6126543
Liu, J., Luo, J., & Mubarak, S. (2009). Recognizing realistic actions from videos “ in the wild.” In 2009 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1–8). https://doi.org/10.1109/CVPR.2009.5206744
Evgeniou, T., & Pontil, M. (2011). Support vector machines : Theory and applications Workshop on support vector machines: Theory and applications. In Machine Learning and Its Applications: Advanced Lectures. https://doi.org/10.1007/3-540-44673-7
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.
MathSciNet Google Scholar
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 832–844.
Google Scholar
Kim, H. J., Lee, J. S., & Yang, H. S. (2007). Human action recognition using a modified convolutional neural network. In International Symposium on Neural Networks (pp. 715–723). Berlin, Heidelberg: Springer.
Google Scholar
Zhou, W., & Zhang, Z. (2014). Human action recognition with multiple-instance markov model. IEEE Transactions on Information Forensics and Security, 9, 1581–1591.
Article Google Scholar
Chen, C. Y., & Grauman, K. (2016). Efficient activity detection in untrimmed video with max-subgraph search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5), 908–921.
Article Google Scholar
Yuan, C., Li, X., Hu, W., Ling, H., & Maybank, S. (2013). 3D R transform on spatio-temporal interest points for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 724–730). https://doi.org/10.1109/CVPR.2013.99
Yan X & Luo Y (2012). Recognizing human actions using a new descriptor based on spatial-temporal interest points and weighted-output. Neurocomputing, 87.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Indian Institute of Engineering Science and Technology, Howrah-711103, West Bengal, India
Swarnava Sadhukhan & Siddhartha Mallick
Department of Information Technology, Jadavpur University, Kolkata-700106, West Bengal, India
Pawan Kumar Singh
Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, West Bengal, India
Ram Sarkar & Debotosh Bhattacharjee

Authors

Swarnava Sadhukhan
View author publications
You can also search for this author in PubMed Google Scholar
Siddhartha Mallick
View author publications
You can also search for this author in PubMed Google Scholar
Pawan Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Debotosh Bhattacharjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawan Kumar Singh .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
J. K. Mandal
Department of Electronics and Communication Engineering, University of Engineering and Management, Kolkata, West Bengal, India
Soumen Banerjee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sadhukhan, S., Mallick, S., Singh, P.K., Sarkar, R., Bhattacharjee, D. (2020). A Comparative Study of Different Feature Descriptors for Video-Based Human Action Recognition. In: Mandal, J., Banerjee, S. (eds) Intelligent Computing: Image Processing Based Applications. Advances in Intelligent Systems and Computing, vol 1157. Springer, Singapore. https://doi.org/10.1007/978-981-15-4288-6_3

Download citation

DOI: https://doi.org/10.1007/978-981-15-4288-6_3
Published: 09 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4287-9
Online ISBN: 978-981-15-4288-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Comparative Study of Different Feature Descriptors for Video-Based Human Action Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Action recognition in depth videos using hierarchical gaussian descriptor

A Robust Framework for the Recognition of Human Action and Activity Using Spatial Distribution Gradients and Gabor Wavelet

Motion of Oriented Magnitudes Patterns for Human Action Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparative Study of Different Feature Descriptors for Video-Based Human Action Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Action recognition in depth videos using hierarchical gaussian descriptor

A Robust Framework for the Recognition of Human Action and Activity Using Spatial Distribution Gradients and Gabor Wavelet

Motion of Oriented Magnitudes Patterns for Human Action Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation