Abstract
Human action recognition (HAR) has been a well-studied research topic in the field of computer vision since the past two decades. The objective of HAR is to detect and recognize actions performed by one or more persons based on a series of observations. In this paper, a comparative study of different feature descriptors applied for HAR on video datasets is presented. In particular, we estimate four standard feature descriptors, namely histogram of oriented gradients (HOG), gray-level co-occurrence matrix (GLCM), speeded-up robust features (SURF), and Graphics and Intelligence-based Scripting Technology (GIST) descriptors from RGB videos, after performing background subtraction and creating a minimum bounding box surrounding the human object. To further speed up the overall process, we apply an efficient sparse filtering method, which reduces the number of features by eliminating the redundant features and assigning weights to the features left after elimination. Finally, the performance of the said feature descriptors on three standard benchmark video datasets namely, KTH, HMDB51, and UCF11 has been analyzed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jaimes, A., & Sebe, N. (2005). Multimodal human computer interaction: A survey. In International workshop on human-computer interaction (pp. 1–15). Berlin, Heidelberg: Springer.
Osmani, V., Balasubramaniam, S., & Botvich, D. (2008). Human activity recognition in pervasive health-care : Supporting efficient remote collaboration. Journal of Network and Computer Applications, 31, 628–655. https://doi.org/10.1016/j.jnca.2007.11.002
Jain, A. K., & Li, S. Z. (2011). Handbook of face recognition (Vol. 1). New York: springer.
Du, Y., Wang, W., & Wang, L. (2015). Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1110–1118.
Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
Gammulle, H., Denman, S., Sridharan, S., & Fookes, C. (2017, March). Two stream lstm: A deep fusion framework for human action recognition. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 177–186) IEEE. https://doi.org/10.1109/WACV.2017.27
Sharif, M., Attique, M., Farooq, K., Jamal, Z., Shah, H., & Akram, T. (2019). Human action recognition : A framework of statistical weighted segmentation and rank correlation-based selection. Pattern Analysis and Applications, (0123456789). https://doi.org/10.1007/s10044-019-00789-0
Ullah, A., Muhammad, K., Ul, I., & Wook, S. (2019). Action recognition using optimized deep auto encoder and CNN for surveillance data streams of non-stationary environments. Future Generation Computer Systems, 96, 386–397. https://doi.org/10.1016/j.future.2019.01.029.
Wang, T., Duan, P., & Ma, B. (2019). Action recognition using dynamic hierarchical trees. Journal of Visual Communication and Image Representation, 61, 315–325.
Singh, R., Kushwaha, A. K. S., & Srivastava, R. (2019). Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimedia Tools and Applications, 78(12), 17165–17196.
Sahoo, S. P., Silambarasi, R., & Ari, S. (2019, March). Fusion of histogram based features for Human Action Recognition. In 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) (pp. 1012–1016). IEEE.
Chakraborty, B., Holte, M. B., Moeslund, T. B., Gonz, J., & Roca, F. X. (2011). A Selective Spatio-Temporal Interest Point Detector for Human Action Recognition in Complex Scenes. (pp. 1776–1783).
Li, B., Ayazoglu, M., Mao, T., Camps, O. I., & Sznaier, M. (2011). Activity Recognition using Dynamic Subspace Angles. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2011.5995672
Rao, A. S., Gubbi, J., Rajasegarar, S., Marusic, S., & Palaniswami, M. (2014). Detection of anomalous crowd behaviour using hyperspherical clustering. In 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA). https://doi.org/10.1109/DICTA.2014.7008100.
Mahadevan, V., Li, W., Bhalodia, V., & Vasconcelos, N. (2010). Anomaly detection in crowded scenes. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE). https://doi.org/10.1109/CVPR.2010.5539872
Yu, T., Kim, T., & Cipolla, R. (2010). Real-time action recognition by spatiotemporal semantic and structural forests. British Machine Vision Conference, BMVC, 52(1-52), 12. https://doi.org/10.5244/C.24.52.
Shabani, A. H., Clausi, D. A., & Zelek, J. S. (2011). Improved spatio-temporal salient feature detection for action recognition. British Machine Vision Conference, 1, 1–12.
Zhen, X., & Shao, L. (2016). Action recognition via spatio-temporal local features: A comprehensive study. Image and Vision Computing, 50, 1–13. https://doi.org/10.1016/J.IMAVIS.2016.02.006.
Sharif, M., Khan, M. A., Akram, T., Javed, M. Y., Saba, T., & Rehman, A. (2017). A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. Eurasip Journal on Image and Video Processing, 2017(1). https://doi.org/10.1186/s13640-017-0236-8
Wu, J., Hu, D., & Chen, F. (2014). Action recognition by hidden temporal models. The Visual Computer, 30(12), 1395–1404. https://doi.org/10.1007/s00371-013-0899-9
Elgammal, A., Duraiswami, R., Harwood, D., & Davis, L. S. (2002). Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of the IEEE, 90(7), 1151–1163.
Otsu, N. (1979). A Threshold selection method from Gray-Level Histograms. IEEE transactions on systems, Man Cybernetics, 9(1), 62–66. Retrieved from http://web-ext.u-aizu.ac.jp/course/bmclass/documents/otsu1979.pdf.
Dalal, N., & Triggs, B. (2010). Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (pp. 886–893).
Ngiam, J., Chen, Z., Bhaskar, S. A., Koh, P. W., & Ng, A. Y. (2011). Sparse filtering. In Advances in neural information processing systems (pp. 1125–1133). Retrieved from https://papers.nips.cc/paper/4334-sparse-filtering.pdf
Bay, H., Tuytelaars, T., & Gool, L. Van. (2008). SURF : Speeded up robust features. Computer Vision and Image Understanding, 110(3), 346–359. Retrieved from http://www.cescg.org/CESCG-2013/papers/Jakab-Planar_Object_Recognition_Using_Local_Descriptor_Based_On_Histogram_Of_Intensity_Patches.pdf
Haralick, R. M., Shanmugam, K., & Dinstein, I. H. (1973). Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3(6), 610–62
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175. https://doi.org/10.1023/A:1011139631724.
Christian, S., Barbara, C., & Ivan, L. (2004). Recognizing Human Actions : A Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004 (pp. 3–7).
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB : A large video database for human motion recognition. In 2011 International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2011.6126543
Liu, J., Luo, J., & Mubarak, S. (2009). Recognizing realistic actions from videos “ in the wild.” In 2009 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1–8). https://doi.org/10.1109/CVPR.2009.5206744
Evgeniou, T., & Pontil, M. (2011). Support vector machines : Theory and applications Workshop on support vector machines: Theory and applications. In Machine Learning and Its Applications: Advanced Lectures. https://doi.org/10.1007/3-540-44673-7
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 832–844.
Kim, H. J., Lee, J. S., & Yang, H. S. (2007). Human action recognition using a modified convolutional neural network. In International Symposium on Neural Networks (pp. 715–723). Berlin, Heidelberg: Springer.
Zhou, W., & Zhang, Z. (2014). Human action recognition with multiple-instance markov model. IEEE Transactions on Information Forensics and Security, 9, 1581–1591.
Chen, C. Y., & Grauman, K. (2016). Efficient activity detection in untrimmed video with max-subgraph search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5), 908–921.
Yuan, C., Li, X., Hu, W., Ling, H., & Maybank, S. (2013). 3D R transform on spatio-temporal interest points for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 724–730). https://doi.org/10.1109/CVPR.2013.99
Yan X & Luo Y (2012). Recognizing human actions using a new descriptor based on spatial-temporal interest points and weighted-output. Neurocomputing, 87.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Sadhukhan, S., Mallick, S., Singh, P.K., Sarkar, R., Bhattacharjee, D. (2020). A Comparative Study of Different Feature Descriptors for Video-Based Human Action Recognition. In: Mandal, J., Banerjee, S. (eds) Intelligent Computing: Image Processing Based Applications. Advances in Intelligent Systems and Computing, vol 1157. Springer, Singapore. https://doi.org/10.1007/978-981-15-4288-6_3
Download citation
DOI: https://doi.org/10.1007/978-981-15-4288-6_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4287-9
Online ISBN: 978-981-15-4288-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)