Video sketch: A middle-level representation for action recognition

Zhang, **ng-Yuan; Huang, Ya-**; Mi, Yang; Pei, Yan-Ting; Zou, Qi; Wang, Song

doi:10.1007/s10489-020-01905-y

Video sketch: A middle-level representation for action recognition

Published: 06 November 2020

Volume 51, pages 2589–2608, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

**ng-Yuan Zhang¹,
Ya-** Huang¹,
Yang Mi²,
Yan-Ting Pei¹,
Qi Zou¹ &
…
Song Wang²

673 Accesses
19 Citations
Explore all metrics

Abstract

Different modalities extracted from videos, such as RGB and optical flows, may provide complementary cues for improving video action recognition. In this paper, we introduce a new modality named video sketch, which implies the human shape information, as a complementary modality for video action representation. We show that video action recognition can be enhanced by using the proposed video sketch. More specifically, we first generate video sketch with class distinctive action areas and then employ a two-stream network to combine the shape information extracted from image-based sketch and point-based sketch, followed by fusing the classification scores of two streams to generate shape representation for videos. Finally, we use the shape representation as the complementary one for the traditional appearance (RGB) and motion (optical flow) representations for the final video classification. We conduct extensive experiments on four human action recognition datasets – KTH, HMDB51, UCF101, Something-Something and UTI. The experimental results show that the proposed method outperforms the existing state-of-the-art action recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hidden Two-Stream Convolutional Networks for Action Recognition

Fast–slow visual network for action recognition in videos

Article 26 March 2022

Sparse Dense Transformer Network for Video Action Recognition

References

Shi Y, Tian Y, Wang Y, Huang T (2017) Sequential deep trajectory descriptor for action recognition with three-stream cnn. IEEE TMM
Zhang P, Lan C, **ng J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. PAMI
Liu Y, Pados DA (2016) Compressed-sensed-domain l 1-pca video surveillance. IEEE TMM
Pérez-Hernández F, Tabik S, Lamas AC, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowledge Based Systems, pp 105590
Yang X, Shyu M-L, Yu H-Q, Sun S-M, Yin N-S, Chen W (2018) Integrating image and textual information in human–robot interactions for children with autism spectrum disorder. IEEE TMM
Kuanar SK, Ranga KB, Chowdhury AS (2015) Multi-view video summarization using bipartite matching constrained optimum-path forest clustering. IEEE TMM
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. PR
Zheng Y, Yao H, Sun X, Zhao S, Porikli F (2018) Distinctive action sketch for human action recognition. Signal Processing
Wang L, **ong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: ECCV
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS
Tang Y, Yi T, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: CVPR
Yan S, **ong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI
Han Z, Xu Z, Zhu S-C (2015) Video primal sketch: A unified middle-level representation for video. JMIV
Yilmaz A, Shah M (2015) Actions sketch: a novel action representation. In: CVPR
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM MM
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. PAMI
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV
Wang X, Gao L, Wang P, Sun X, Liu X (2017) Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length. IEEE TMM
Hu J-F, Zheng W-S, Pan J, Lai J, Zhang J (2018) Deep bilinear learning for rgb-d action recognition. In: ECCV
Li L, Wang S, Hu B, Qiong Q, Wen J, Rosenblum DS (2018) Learning structures of interval-based bayesian networks in probabilistic generative model for human complex activity recognition. Pattern Recognition 81:545–561
Article Google Scholar
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR
Gao R, Bo X, Grauman K (2018) Im2flow: Motion hallucination from static images for action recognition. In: CVPR
Ng JY-H, Choi J, Neumann J, Davis LS (2018) Actionflownet: Learning motion representation for action recognition. In: WACV
Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector cnns. In: CVPR
Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W (2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: CVPR
Piergiovanni AJ, Ryoo MS (2019) Representation flow for action recognition. In: CVPR
Wang X, Gao L, Wang P, Sun X, Liu X (2018) Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Transactions on Multimedia 20:634–644
Article Google Scholar
Zolfaghari M, Oliveira GL, Sedaghat N, Brox T (2017) Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: ICCV
Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: CVPR
Nie BX, **ong C, Zhu S-C (2015) Joint action recognition and pose estimation from video. In: CVPR
Luvizon DC, Picard D, Tabia H (2018) 2d/3d pose estimation and action recognition using multitask deep learning. In: CVPR
Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: ECCV
Song S, Lan C, **ng J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: IJCAI
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: CVPR
Wang P, Yuan C, Hu W, Li B, Zhang Y (2016) Graph based skeleton motion representation and similarity measurement for action recognition. In: ECCV
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: CVPR
Jain A, Zamir AR, Savarese S, Saxena A (2016) Structural-rnn: Deep learning on spatio-temporal graphs. In: CVPR
Zhang S, Yang Y, **ao J, Liu X, Yang Y, **e D, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer lstm networks. IEEE TMM
Hou Y, Li Z, Wang P, Li W (2018) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology 28:807–811
Article Google Scholar
Liu J, Wang G, Hu P, Duan L-Y, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: CVPR
Li D, Yao T, Duan L-Y, Mei T, Rui Y (2018) Unified spatio-temporal attention networks for action recognition in videos. IEEE TMM
Du W., Wang Y, Qiao Y (2017) Rpan An end-to-end recurrent pose-attention network for action recognition in videos. In: ICCV
Zhu Q, Song G, Shi J (2007) Untangling cycles for contour grou**
Wang S, Kubota T, Siskind JM, Wang J (2005) Salient closed boundary extraction with ratio contour. PAMI
Arbelaez P, Maire M, Fowlkes C, Malik J (2010) Contour detection and hierarchical image segmentation. PAMI
Marvaniya S, Bhattacharjee S, Manickavasagam V, Mittal A (2012) Drawing an automatic sketch of deformable objects using only a few images. In: ECCV. Springer
Lim JJ, Zitnick LC, Dollár P (2013) Sketch tokens: A learned mid-level representation for contour and object detection. In: CVPR
Qi Y, Song Y-Z, **. In: CVPR
**e S, Tu Z (2015) Holistically-nested edge detection. In: ICCV
Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: CVPR
Zhang X, Huang Y, Qi Z, Guan Q, Liu J (2018) Making better use of edges for sketch generation. JEI
Yu Z, Feng C, Liu M-Y, Ramalingam S (2017) Casenet: Deep category-aware semantic edge detection. In: CVPR
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPS
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: CVPR
Zhang X, Li X, Li X, Shen M (2018) Better freehand sketch synthesis for sketch-based image retrieval: Beyond image edges. Neurocomputing
Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM TOG
Eitz M, Hildebrand K, Boubekeur T, Alexa M (2010) Sketch-based image retrieval: Benchmark and bag-of-features descriptors. TVCG
Schneider RG, Tuytelaars T (2014) Sketch classification and classification-driven analysis using fisher vectors. ACM TOG
Li Y, Hospedales TM, Song Y-Z, Gong S (2015) Free-hand sketch recognition by multi-kernel feature learning. CVIU
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556
Sert M, Boyacı E (2019) Sketch recognition using transfer learning. Multimedia Tools and Applications
Zhang H, She P, Liu Y, Gan J, Cao X, Foroosh H (2019) Learning structural representations via dynamic object landmarks discovery for sketch recognition and retrieval. IEEE TIP
Yu Q., Yang Y, Liu F, Song Y-Z, **ang T, Hospedales TM (2017) Sketch-a-net: A deep neural network that beats humans. IJCV
Sarvadevabhatla RK, Babu RV (2015) Freehand sketch recognition using deep features. ar**v
Zhang H, Si L, Zhang C, Ren W, Wang R, Cao X (2016) Sketchnet: Sketch classification with web images. In: CVPR
**ao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease mirnas. Knowl Based Syst 175:118–129
Article Google Scholar
Sun S, Shawe-Taylor J, Mao L (2017) Pac-bayes analysis of multi-view learning. Inf Fusion 35:117–131
Article Google Scholar
Higgs M, Shawe-Taylor J (2010) A pac-bayes bound for tailored density estimation. In: ALT
Seldin Y, Laviolette F, Cesa-Bianchi N, Shawe-Taylor J, Auer P (2012) Pac-bayesian inequalities for martingales. IEEE Trans Inf Theory 58:7086–7093
Article MathSciNet Google Scholar
Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl Based Syst 163:1009–1019
Article Google Scholar
Sun S, Mao L, Dong Z, Wu L (2019) Multiview machine learning. In: Springer, Singapore
Sun S, Liu Y, Mao L (2019) Multi-view learning for visual violence recognition with maximum entropy discrimination and deep features. Inf Fusion 50:43–53
Article Google Scholar
Liu M, Zhang J, Yap P-T, Shen D (2017) View-aligned hypergraph learning for alzheimer’s disease diagnosis with incomplete multi-modality data. Med Image Anal 36:123–134
Article Google Scholar
Zhang W, Zhou H, Sun S, Wang Z, Shi J, Loy CC (2019) Robust multi-modality multi-object tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 2365–2374
Gkalelis N, Nikolaidis N, Pitas I (2009) View indepedent human movement recognition from multi-view video exploiting a circular invariant posture representation. In: 2009 IEEE International Conference on Multimedia and Expo. IEEE, pp 394–397
Iosifidis A, Tefas A, Pitas I (2013) View-independent human action recognition based on multi-view action images and discriminant learning. In: IVMSP 2013. IEEE, pp 1–4
Ren Z, Zhang Q, Gao X, Hao P, Cheng J (2020) Multi-modality learning for human action recognition. Multimedia Tools and Applications 1–19
Wang T, Brown H-F Drawing aid system for multi-touch devices, October 14 2014. US Patent 8,860,675
Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle net: Person re-identification with human body region guided feature decomposition and fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1077–1085
Chen H, Wang G, Xue J-H, He L (2016) A novel hierarchical framework for human action recognition. PR
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: ICCV
Laptev I, Caputo B, et al. (2004) Recognizing human actions: a local svm approach. In: Null
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: ICCV
Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. ar**v:1212.0402
Qi J, Yu M, Fan X, Li H (2017) Sequential dual deep learning with shape and texture features for sketch recognition
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. ar**v:1502.03167
Liu Z, Gao J, Yang G, Zhang H, He Y (2016) Localization and classification of paddy field pests using a saliency map and deep convolutional neural network. Scientific reports
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. PAMI
Carlsson S, Sullivan J (2001) Action recognition by shape matching to key frames. In: Workshop on models versus exemplars in computer vision, volume 1
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: CVPR Workshops. IEEE
Li W, Zhang Z, Liu Z (2008) Expandable data-driven graphical modeling of human actions based on salient postures. IEEE transactions on Circuits and Systems for Video Technology
Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Del Bimbo A (2014) 3-d human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE transactions on cybernetics
Ha VHS, Moura JMF (2005) Affine-permutation invariance of 2-d shapes. IEEE TIP
Eldar Y, Lindenbaum M, Porat M, Zeevi YY (1997) The farthest point strategy for progressive image sampling. IEEE TIP
Moenning C, Dodgson NA (2003) Fast marching farthest point sampling. Technical report, University of Cambridge, Computer Laboratory
Parameswaran V, Chellappa R (2006) View invariance for human action recognition. IJCV
Ahmad M, Lee S-W (2008) Human action recognition using shape and clg-motion flow from multi-view image sequences. PR
Christopher M, et al. (1995) Bishop Neural networks for pattern recognition. Oxford University Press
Vinyals O, Bengio S, Kudlur M (2015) Order matters: Sequence to sequence for sets. Computer Science
Qi CR, Su H., Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CGM (2018) Videolstm convolves, attends and flows for action recognition. CVIU
Goyal R, Kahou SE, Michalski V, Materzynska J, Westphal S, Kim H, Haenel V, Fruend I, Yianilos P, Mueller-Freitag M, et al. (2017) The “something something” video database for learning and evaluating visual common sense. In: ICCV
Ryoo MS, Aggarwal JK (2010) Ut-interaction dataset, icpr contest on semantic description of human activities (sdha). In: IEEE International Conference on Pattern Recognition Workshops, vol 2, p 4
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. IJCV
Qiu Z, Yao T, Mei T (2017) Deep quantization: Encoding convolutional activations with deep generative model. In: CVPR
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Mahmood M, Jalal A, Sidduqi MA (2018) Robust spatio-temporal features for human interaction recognition via artificial neural network. 2018 International Conference on Frontiers of Information Technology (FIT), pp 218–223
Jalal A, Mahmood M (2019) Students’ behavior mining in e-learning environment using cognitive processes with information technologies. Educ Inf Technol, pp 1–25
Nour el Houda Slimani K, Benezeth Y, Souami F (2020) Learning bag of spatio-temporal features for human interaction recognition. In: International Conference on Machine Vision
Chattopadhyay C, Das S (2016) Supervised framework for automatic recognition and retrieval of interaction: a framework for classification and retrieving videos with similar human interactions. IET Comput Vis 10:220–227
Article Google Scholar
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence
Akbarian MSA, Saleh F, Salzmann M, Fernando B, Petersson L, Andersson L (2017) Encouraging lstms to anticipate actions very early. 2017 IEEE International Conference on Computer Vision (ICCV), pp 280–289
Kong Y, Fu Y (2016) Max-margin action prediction machine. IEEE Trans Pattern Anal Mach Intell 38:1844–1858
Article Google Scholar
Raptis M, Sigal L (2013) Poselet key-framing: A model for human activity recognition. 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp 2650–2657
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Leveraging structural context models and ranking score fusion for human interaction prediction. IEEE Transactions on Multimedia 20:1712–1723
Article Google Scholar
Chen L, Lu J, Song Z, Zhou J (2018) Part-activated deep reinforcement learning for action prediction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 421–436
Xu W, Yu J, Miao Z, Wan L, Ji Q (2019) Prediction-cgan: Human action prediction with conditional generative adversarial networks. Proceedings of the 27th ACM International Conference on Multimedia
Perez M, Liu J, Kot AC (2019) Interaction relational network for mutual action recognition. ar**v:1910.04963
Cai Z, Wang L, Peng X, Qiao Y u (2014) Multi-view super vector for action recognition. In: CVPR
Peng X., Wang L, **ngxing W., Yu Q (2016) Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. CVIU
Wang L, Yu Q, Tang X (2016) Mofap: A multi-level representation for action recognition. IJCV
Wang X, Farhadi A, Gupta A (2016) Actions transformations. In: CVPR
Ng JY-H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In: CVPR
Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: ICCV
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPR
Yang HT, Yuan C, Li B, Du Y, **ng J, Hu W, Maybank SJ (2019) Asymmetric 3d convolutional neural networks for action recognition. Pattern Recognit 85:1–12
Article Google Scholar
Feichtenhofer C, Pinz A, Wildes R (2016) Spatiotemporal residual networks for video action recognition. In: NIPS
Li D, Yao T, Duan Ly, Mei T, Rui Y (2019) Unified spatio-temporal attention networks for action recognition in videos. IEEE Trans Multimed 21:416–428
Article Google Scholar
Li Y, Song S, Li Y, Liu J (2019) Temporal bilinear networks for video action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8674–8681
Li Y, Ji B, Shi X, Zhang J, Kang B, Wang L (2020) Tea: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 909–918
Du T, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: CVPR
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. 2017 IEEE Conference on Computer Vision and Patter Recognition (CVPR) pp 4724–4733

Download references

Acknowledgements

This work is supported by Fundamental Research Funds for the Central Universities (2018YJS045, 2019JBZ104).

Author information

Authors and Affiliations

Bei**g Key Laboratory of Traffic Data Analysis and Mining, Bei**g Jiaotong University, Bei**g, 100044, China
**ng-Yuan Zhang, Ya-** Huang, Yan-Ting Pei & Qi Zou
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29208, USA
Yang Mi & Song Wang

Authors

**ng-Yuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ya-** Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Mi
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Ting Pei
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zou
View author publications
You can also search for this author in PubMed Google Scholar
Song Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya-** Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, XY., Huang, YP., Mi, Y. et al. Video sketch: A middle-level representation for action recognition. Appl Intell 51, 2589–2608 (2021). https://doi.org/10.1007/s10489-020-01905-y

Download citation

Published: 06 November 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s10489-020-01905-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video sketch: A middle-level representation for action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hidden Two-Stream Convolutional Networks for Action Recognition

Fast–slow visual network for action recognition in videos

Sparse Dense Transformer Network for Video Action Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Video sketch: A middle-level representation for action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hidden Two-Stream Convolutional Networks for Action Recognition

Fast–slow visual network for action recognition in videos

Sparse Dense Transformer Network for Video Action Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation