Abstract
Graph convolutional networks (GCN) have received more and more attention in skeleton-based action recognition. Many existing GCN models pay more attention to spatial information and ignore temporal information, but the completion of actions must be accompanied by changes in temporal information. Besides, the channel, spatial, and temporal dimensions often contain redundant information. In this paper, we design a temporal graph convolutional network (FTGCN) module which can concentrate more temporal information and properly balance them for each action. In order to better integrate channel, spatial and temporal information, we propose a unified attention model of the channel, spatial and temporal (CSTA). A basic block containing these two novelties is called FTC-GCN. Extensive experiments on two large-scale datasets, compared with 17 methods on NTU-RGB+D and 8 methods on Kinetics-Skeleton, show that for skeleton-based human action recognition, our method achieves the best performance.
Similar content being viewed by others
References
Wang X (2013) surveillance, Intelligent multi-camera video. A Rev Pattern Recognit Lett 34 (1):3–19
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Technol 18(11):1473–1488
Ellis C, Masood SZ, Tappen MF, LaViola JJ, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vis 101(3):420– 436
Zhang W, Smith ML, Smith LN, Farooq A (2016) Gender and gaze gesture recognition for human-computer interaction. Comput Vis Image Underst 149:32–50
Camporesi C, Kallmann M, Han JJ (2013) Vr solutions for improving physical therapy. In: 2013 IEEE Virtual Reality (VR). IEEE, pp 77–78
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387
Krizhevsky A, Ilya S, Geoffrey HE (2017) Imagenet classification with deep convolutional neural networks. Communications of the Acm, USA
Greff K, Srivastava RK, Koutnik J, Steunebrink BR, Schmidhuber J (2016) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst 1–11
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(1):2493–2537
Tarwani KM, Edem S (2017) Survey on recurrent neural network in natural language processing. Int J Eng Trends Technol 48:301–304
Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. Knowl Based Syst 102–106
Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR)
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: CVPR, p 2017
Shao Z, Li Y, Yao G, Yang J, Wang Z (2018) A hierarchical model for action recognition based on body parts. In: 2018 IEEE international conference on robotics and automation (ICRA)
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks, ar**v:1609.02907
Gao H, Wang Z, Ji S (2018) Large-scale learnable graph convolutional networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1416–1424
Wu F, Zhang T, Souza AHd Jr, Fifty C, Yu T, Weinberger KQ (2019) Simplifying graph convolutional networks, ar**v:1902.07153
Chen J, Ma T, **ao C (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling, ar**v:1801.10247
Balcilar M, Renton G, Heroux P, Gauzere B, Adam S, Honeine P (2020) Bridging the gap between spectral and spatial domains in graph neural networks
Ma Y, Wang S, Aggarwal CC, Tang J (2019) Graph convolutional networks with eigenpooling. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 723–731
Zhang M, Cui Z, Neumann M, Chen Y (2018) An end-to-end deep learning architecture for graph classification. In: AAAI, vol 18, pp 4438–4445
Bresson X, Laurent T (2017) Residual gated graph convnets, ar**v:1711.07553
Wang H, Leskovec J (2020) Unifying graph convolutional neural networks and label propagation, ar**v:2002.06755
Huang W, Zhang T, Rong Y, Huang J (2018) Adaptive sampling towards fast graph representation learning. Adv Neural Inform Process Syst 31:4558–4567
Sun K, Lin Z, Zhu Z (2019) Adagcn: Adaboosting graph convolutional networks into deep models, ar** and multi-scale deep cnn. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 601–604
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12,026–12,035
Shuman DI, Narang SK, Frossard P, Ortega A, Vandergheynst P (2013) The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process Magaz 30(3):83–98
Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+d: A large scale dataset for 3d human activity analysis. 1010–1019
Kay W, Carreira J, Simonyan K, Zhang B, Zisserman A (2017) The kinetics human action video dataset
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: 2017 IEEE international conference on computer vision (ICCV)
Liu M, Hong L, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks
Jianru X, Wenjun Z, Junliang X, Cuiling L, Nanning Z, Pengfei Z (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision
Song YF, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons 2019
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3595–3603
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7912–7921
Yang D, Li MM, Fu H, Fan J, Leung H (2020) Centrality graph convolutional networks for skeleton-based action recognition, ar**v:2003.03007
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, BK., Dong, L., Bi, HB. et al. Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition. Appl Intell 52, 5608–5616 (2022). https://doi.org/10.1007/s10489-021-02723-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02723-6