Abstract
This paper presents a novel framework for human action recognition based on sparse coding. We introduce an effective coding scheme to aggregate low-level descriptors into the super descriptor vector (SDV). In order to incorporate the spatio-temporal information, we propose a novel approach of super location vector (SLV) to model the space-time locations of local interest points in a much more compact way compared to the spatio-temporal pyramid representations. SDV and SLV are in the end combined as the super sparse coding vector (SSCV) which jointly models the motion, appearance, and location cues. This representation is computationally efficient and yields superior performance while using linear classifiers. In the extensive experiments, our approach significantly outperforms the state-of-the-art results on the two public benchmark datasets, i.e., HMDB51 and YouTube.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bhattacharya, S., Sukthankar, R., **, R., Shah, M.: A Probabilistic Representation for Efficient Large-Scale Visual Recognition Tasks. In: CVPR (2011)
Brendel, W., Todorovic, S.: Activities as Time Series of Human Postures. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 721–734. Springer, Heidelberg (2010)
Coates, A., Ng, A.: The Importance of Encoding versus Training with Sparse Coding and Vector Quantization. In: ICML (2011)
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: A Library for Large Linear Classification. JMLR (2008)
Gemert, J., Veenman, C., Smeulders, A., Geusebroek, J.: Visual Word Ambiguity. PAMI (2009)
Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion Interchange Patterns for Action Recognition in Unconstrained Videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 256–269. Springer, Heidelberg (2012)
Ikizler-Cinbis, N., Sclaroff, S.: Object, Scene and Actions: Combining Multiple Features for Human Action Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)
Jaakkola, T., Haussler, D.: Exploiting Generative Models in Discriminative Classifiers. In: NIPS (1998)
Jain, M., Jegou, H., Bouthemy, P.: Better Exploiting Motion for Better Action Recognition. In: CVPR (2013)
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating Local Descriptors into a Compact Image Representation. In: CVPR (2010)
Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-Based Modeling of Human Actions with Motion Reference Points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012)
Krapac, J., Verbeek, J., Jurie, F.: Modeling Spatial Layout with Fisher Vector for Image Categorization. In: ICCV (2011)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: A Large Video Database for Human Motion Recognition. In: CVPR (2011)
Laptev, I.: On Space-Time Interest Points. IJCV (2005)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning Realistic Human Actions from Movies. In: CVPR (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR (2006)
Le, Q., Zou, W., Yeung, S., Ng, A.: Learning Hierarchical Invariant Spatio-Temporal Features for Action Recognition with Independent Subspace Analysis. In: CVPR (2011)
Liu, J., Luo, J., Shah, M.: Recognizing Realistic Actions from Videos in the Wild. In: CVPR (2009)
Liu, L., Wang, L., Liu, X.: In Defense of Soft-Assignment Coding. In: ICCV (2011)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online Dictionary Learning for Sparse Coding. In: ICML (2009)
McCann, S., Lowe, D.G.: Spatially Local Coding for Object Recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 204–217. Springer, Heidelberg (2013)
Peng, X., Qiao, Y., Peng, Q., Qi, X.: Exploring Motion Boundary based Sampling and Spatio-Temporal Context Descriptors for Action Recognition. In: BMVC (2013)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for Large-Scale Image Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Sanchez, J., Perronnin, F., Campos, T.: Modeling the Spatial Layout of Images Beyond Spatial Pyramids. PRL (2012)
Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image Classification with the Fisher Vector: Theory and Practice. IJCV (2013)
Wang, H., Klaser, A., Schmid, C., Liu, C.: Dense Trajectories and Motion Boundary Descriptors for Action Recognition. IJCV (2013)
Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of Local Spatio-Temporal Features for Action Recognition. In: BMVC (2009)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-Constrained Linear Coding for Image Classification. In: CVPR (2010)
Wang, X., Wang, L., Qiao, Y.: A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part III. LNCS, vol. 7726, pp. 572–585. Springer, Heidelberg (2013)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification. In: CVPR (2009)
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image Classification Using Super-Vector Coding of Local Image Descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, X., Tian, Y. (2014). Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8690. Springer, Cham. https://doi.org/10.1007/978-3-319-10605-2_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-10605-2_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10604-5
Online ISBN: 978-3-319-10605-2
eBook Packages: Computer ScienceComputer Science (R0)