Learning Actionlet Ensemble for 3D Human Action Recognition

Wang, Jiang; Liu, Zicheng; Wu, Ying

doi:10.1007/978-3-319-04561-0_2

Jiang Wang¹⁷,
Zicheng Liu¹⁸ &
Ying Wu¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1656 Accesses
39 Citations

Abstract

Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this chapter, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This dataset will be released to public.

References

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
Google Scholar
“CMU graphics lab motion capture database”, http://www.mocap.cs.cmu.edu/
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: Human Communicative Behavior Analysis Workshop (in Conjunction with CVPR) (2010)
Google Scholar
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from RGBD images. In: International Conference on Robotics and Automation (2012)
Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)
Google Scholar
Laptev, I.: On space-time interest points. IJCV 64(2–3), 107–123 (2005)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Google Scholar
Dalal, N., Triggs, B., Histograms of oriented gradients for human detection. In: IEEE CVPR, pp. 886–893 (2005)
Google Scholar
Campbell, L.W., Bobick, A.F.: Recognition of human body motion using phase space constraints. In: ICCV (1995)
Google Scholar
Lv, F., Nevatia, R.: Recognition and segmentation of 3-D human action using HMM and Multi-class AdaBoost. In: ECCV, pp. 359–372 (2006)
Google Scholar
Han, L., Wu, X., Liang, W., Hou, G., Jia, Y.: Discriminative human action recognition in the learned hierarchical manifold space. Image Vis. Comput. 28(5), 836–849 (2010)
Google Scholar
Ning, H., Xu, W., Gong, Y., Huang, T.: Latent pose estimator for continuous action. In: ECCV, pp. 419–433 (2008)
Google Scholar
Chen, H.S., Chen, H.T., Chen, Y.W., Lee, S.Y.: Human action recognition using star skeleton. In: Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, pp. 171–178, New York, USA (2006)
Google Scholar
Martens J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: ICML (2011)
Google Scholar
Muller, M., Röder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 137–146, Eurographics Association (2006)
Google Scholar
Li L., Prakash, B.A. Time series clustering: complex is simpler! In: ICML (2011)
Google Scholar
Dai, S., Yang, M., Wu, Y., Katsaggelos, A.: Detector ensemble. In: IEEE CVPR, pp. 1–8 (2007)
Google Scholar
Zhu, L., Chen, Y., Lu, Y., Lin, C., Yuille, A.: Max margin AND/OR graph learning for parsing the human body. In: IEEE CVPR (2008)
Google Scholar
Yuan, J., Yang, M., Wu, Y.: Mining discriminative co-occurrence patterns for visual recognition. In: CVPR (2011)
Google Scholar
Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: CVPR (2010)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Bourdev L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: CVPR (2009)
Google Scholar
Desai C., Ramanan, D.: Detecting actions, poses, and objects with relational phraselets. In: ECCV (2012)
Google Scholar
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: IEEE CVPR (2011)
Google Scholar
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: ECCV, pp. 1–14 (2012)
Google Scholar
Vieira, A.W., Nascimento, E.R.. Oliveira, G.L., Liu, Z., Campos, M.M.: STOP: space-time occupancy patterns for 3D action recognition from depth map sequences. In: 17th Iberoamerican Congress on Pattern Recognition, Buenos Aires (2012)
Google Scholar
Yang X., Tian, Y.: EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor. In: CVPR 2012 HAU3D, Workshop (2012)
Google Scholar
Yang, X., Zhang, C., Tian, Y.L.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM Multimedia (2012)
Google Scholar
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Visual Commun Image Represent. 26, 1140–1145 (2013)
Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L. Samaras, D., Brook, S.: Two-person interaction detection using body-pose features and multiple instance learning. In: CVPR 2012 HAU3D Workshop (2012)
Google Scholar
Raptis, M., Kirovski, D., Hoppe, H.: Real-time classification of dance gestures from skeleton animation. In: Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation—SCA ’11, p. 147. ACM Press, New York, NY, USA (2011)
Google Scholar
Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., and Vidal, R.: Bio-inspired dynamic 3D discriminative skeletal features for human action recognition. In: HAU3D13 (2013)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE CVPR, vol. 2 (2006)
Google Scholar
Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete Time Signal Processing (Prentice Hall Signal Processing Series). Prentice Hall, Upper Saddle River (1999)
Google Scholar
Fischler M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Google Scholar
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1), 131–159 (2002)
Google Scholar
Wu, T.-F., Lin, C.-J. Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. JMLR 5, 975–1005 (2004)
Google Scholar
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2(3), 916–954 (2008)
Google Scholar
**a, L., Chen, C.-C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints The University of Texas at Austin. In: CVPR 2012 HAU3D Workshop (2012)
Google Scholar
Wang, J., Yuan, J., Chen, Z., Wu, Y.: Spatial locality-aware sparse coding and dictionary learning. In: ACML (2012)
Google Scholar
Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. ar**v preprint ar**v:1210.1207 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Northwestern University, Evanston, IL, USA
Jiang Wang & Ying Wu
Microsoft Research, Redmond, WA, USA
Zicheng Liu

Authors

Jiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zicheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Wang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, J., Liu, Z., Wu, Y. (2014). Learning Actionlet Ensemble for 3D Human Action Recognition. In: Human Action Recognition with Depth Cameras. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-04561-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-04561-0_2
Published: 26 January 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04560-3
Online ISBN: 978-3-319-04561-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics