Abstract
The collection of video data for action recognition is very susceptible to measurement bias; the equipment used, camera angle and environmental conditions are all factors that majorly affect the distribution of the collected dataset. Inevitably, training a classifier that can successfully generalize to new data becomes a very hard problem, since it is impossible to gather general enough training sets. Recent approaches in the literature attempt to solve this problem by augmenting a given training set, with synthetic data, so as to better represent the global distribution of the covariates. However, these approaches are limited because they essentially involve hand-crafted data synthesizers, which are typically hard to implement and problem specific. In this work, we propose a different approach to tackling the above issues, which relies on the combination of two techniques: pose extraction, and domain adaptation as a means to improve the generalization capabilities of classifiers. We show that adapted skeletal representations can be retrieved automatically in a semi-supervised setting and these help to generalize classifiers to new forms of measurement bias. We empirically validate our approach for generalizing across different camera angles.
Similar content being viewed by others
References
Aggarwal JK (2005) Human activity recognition: a grand challenge. In: Digital image computing: techniques and applications (DICTA’05). IEEE, p 1
Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. Comput Vis Image Understanding 171:118–139
Berretti S, Daoudi M, Turaga P, Basu A (2018) Representation, analysis, and recognition of 3D humans: a survey. ACM Trans Multim Comput Commun Appl (TOMM) 14(1):1–36
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR 2011. IEEE, pp 1297–1304
Liu J, Shahroudy A, Perez ML, Wang G, Duan L-Y, Chichung AK (2019) NTU RGB+ D 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2916873
Liu C, Hu Y, Li Y, Song S, Liu J (2017) PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding. ar**v:1703.07475
Van Dyk DA, Meng X-L (2001) The art of data augmentation. J Comput Graph Stat 10(1):1–50
Ding J, Chen B, Liu H, Huang M (2016) Convolutional neural network with data augmentation for SAR target recognition. IEEE Geosci Remote Sens Lett 13(3):364–368
Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image map** and multi-scale deep CNN. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 601–604
Papadakis A, Mathe E, Vernikos I, Maniatis A, Spyrou E, Mylonas P (2019) Recognizing human actions using 3d skeletal information and CNNs. In: Proceedings of international conference on engineering applications of neural networks (EANN)
Lawton MP, Brody EM (1969) Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontol 9(3 Part 1):179–186
Papadakis A, Mathe E, Spyrou E, Mylonas P (2019) A geometric approach for cross-view human action recognition using deep learning. In: Proceedings of international symposium on image and signal processing and analysis (ISPA)
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. ar**v:1212.0402
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 international conference on computer vision. IEEE, pp 2556–2563
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR asian conference on pattern recognition (ACPR). IEEE, pp 579–583
Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst 158:43–53
Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811
Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362
Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) Skeletonnet: mining deep part features for 3-d action recognition. IEEE Signal Process Lett 24(6):731–735
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Xu T et al (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1234–1241. https://doi.org/10.1109/CVPR.2012.6247806
Tas Y, Koniusz P (2018) Cnn-based action recognition and supervised domain adaptation on 3d body skeletons via kernel feature maps. ar**v:1806.09078
Koniusz P, Tas Y, Porikli F (2017) Domain adaptation by mixture of alignments of second- or higher-order scatter tensors. In: CVPR
Zhang J et al (2016) Semi-supervised image-to-video adaptation for video action recognition. IEEE Trans Cybern 47(4):960–973
Hachiya H, Sugiyama M, Ueda N (2012) Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition. Neurocomputing 80:93–101
Jiang W, Yin Z (2015) Human activity recognition using wearable sensors by deep convolutional neural networks. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 1307–1310
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Mach Learn 79(1–2):151–175
Csurka G (2017) A comprehensive survey on domain adaptation for visual applications. In: Csurka G (ed) Domain adaptation in computer vision applications. Advances in computer vision and pattern recognition. Springer, Cham
Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153
Tzeng E et al (2017) Adversarial discriminative domain adaptation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2962–2971
Ajakan H et al (2014) Domain-adversarial neural networks. ar**v:1412.4446
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. NIPS
Chollet F (2015) Keras. https://github.com/fchollet/keras. Accessed 22 April 2020
Abadi M et al (2016) TensorFlow: a system for large-scale maching learning. In: Proceedings of the USENIX symposium on operating systems design and implementation (OSDI)
Acknowledgements
This project has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under Grant Agreement No. 273 (Funding Decision: GGET122785/I2/19-07-2018). We also acknowledge support of this work by the Project SYNTELESIS “Innovative Technologies and Applications based on the Internet of Things (IoT) and the Cloud Computing” (MIS 5002521) which is implemented under the “Action for the Strategic Development on the Research and Technological Sector”, funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pikramenos, G., Mathe, E., Vali, E. et al. An adversarial semi-supervised approach for action recognition from pose information. Neural Comput & Applic 32, 17181–17195 (2020). https://doi.org/10.1007/s00521-020-05162-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05162-5