Abstract
Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this chapter, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This dataset will be released to public.
References
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
“CMU graphics lab motion capture database”, http://www.mocap.cs.cmu.edu/
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: Human Communicative Behavior Analysis Workshop (in Conjunction with CVPR) (2010)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from RGBD images. In: International Conference on Robotics and Automation (2012)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)
Laptev, I.: On space-time interest points. IJCV 64(2–3), 107–123 (2005)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Dalal, N., Triggs, B., Histograms of oriented gradients for human detection. In: IEEE CVPR, pp. 886–893 (2005)
Campbell, L.W., Bobick, A.F.: Recognition of human body motion using phase space constraints. In: ICCV (1995)
Lv, F., Nevatia, R.: Recognition and segmentation of 3-D human action using HMM and Multi-class AdaBoost. In: ECCV, pp. 359–372 (2006)
Han, L., Wu, X., Liang, W., Hou, G., Jia, Y.: Discriminative human action recognition in the learned hierarchical manifold space. Image Vis. Comput. 28(5), 836–849 (2010)
Ning, H., Xu, W., Gong, Y., Huang, T.: Latent pose estimator for continuous action. In: ECCV, pp. 419–433 (2008)
Chen, H.S., Chen, H.T., Chen, Y.W., Lee, S.Y.: Human action recognition using star skeleton. In: Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, pp. 171–178, New York, USA (2006)
Martens J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: ICML (2011)
Muller, M., Röder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 137–146, Eurographics Association (2006)
Li L., Prakash, B.A. Time series clustering: complex is simpler! In: ICML (2011)
Dai, S., Yang, M., Wu, Y., Katsaggelos, A.: Detector ensemble. In: IEEE CVPR, pp. 1–8 (2007)
Zhu, L., Chen, Y., Lu, Y., Lin, C., Yuille, A.: Max margin AND/OR graph learning for parsing the human body. In: IEEE CVPR (2008)
Yuan, J., Yang, M., Wu, Y.: Mining discriminative co-occurrence patterns for visual recognition. In: CVPR (2011)
Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: CVPR (2010)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, vol. 1215, pp. 487–499 (1994)
Bourdev L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: CVPR (2009)
Desai C., Ramanan, D.: Detecting actions, poses, and objects with relational phraselets. In: ECCV (2012)
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: IEEE CVPR (2011)
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: ECCV, pp. 1–14 (2012)
Vieira, A.W., Nascimento, E.R.. Oliveira, G.L., Liu, Z., Campos, M.M.: STOP: space-time occupancy patterns for 3D action recognition from depth map sequences. In: 17th Iberoamerican Congress on Pattern Recognition, Buenos Aires (2012)
Yang X., Tian, Y.: EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor. In: CVPR 2012 HAU3D, Workshop (2012)
Yang, X., Zhang, C., Tian, Y.L.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM Multimedia (2012)
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Visual Commun Image Represent. 26, 1140–1145 (2013)
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L. Samaras, D., Brook, S.: Two-person interaction detection using body-pose features and multiple instance learning. In: CVPR 2012 HAU3D Workshop (2012)
Raptis, M., Kirovski, D., Hoppe, H.: Real-time classification of dance gestures from skeleton animation. In: Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation—SCA ’11, p. 147. ACM Press, New York, NY, USA (2011)
Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., and Vidal, R.: Bio-inspired dynamic 3D discriminative skeletal features for human action recognition. In: HAU3D13 (2013)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE CVPR, vol. 2 (2006)
Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete Time Signal Processing (Prentice Hall Signal Processing Series). Prentice Hall, Upper Saddle River (1999)
Fischler M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1), 131–159 (2002)
Wu, T.-F., Lin, C.-J. Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. JMLR 5, 975–1005 (2004)
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2(3), 916–954 (2008)
**a, L., Chen, C.-C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints The University of Texas at Austin. In: CVPR 2012 HAU3D Workshop (2012)
Wang, J., Yuan, J., Chen, Z., Wu, Y.: Spatial locality-aware sparse coding and dictionary learning. In: ACML (2012)
Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. ar**v preprint ar**v:1210.1207 (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 The Author(s)
About this chapter
Cite this chapter
Wang, J., Liu, Z., Wu, Y. (2014). Learning Actionlet Ensemble for 3D Human Action Recognition. In: Human Action Recognition with Depth Cameras. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-04561-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-04561-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04560-3
Online ISBN: 978-3-319-04561-0
eBook Packages: Computer ScienceComputer Science (R0)