Abstract
With the development of the economy and the improvement of people’s living standard, social robotics gradually enter into daily lives of individuals. Human–robot interaction is the basic function of social robotics, and how to achieve better experience of human–robot interaction is an important issue in the field of social robotics. Single-person pose estimation is the core technology for human–robot interaction in social robots. Benefiting from the development of deep learning, single-person pose estimation has made great progress. This paper reviews the development of single-person pose estimation from four aspects: data augmentation, the evolution of SPPE model, learning target and post-processing. Besides, we give the commonly used datasets and evaluation metrics. Finally, the problems of SPPE are discussed and the future research trends are given.
References
Alhaddad AY, Cabibihan JJ, Bonarini A (2020) Influence of reaction time in the emotional response of a companion robot to a child’s aggressive interaction. Int J Soc Robotics 12:1279–1291
Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1014–1021
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693
Artacho B, Savakis A (2020) Unipose: unified human pose estimation in single images and videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7035–7044
Baker B, Gupta O, Naik N, Raskar R (2016) Designing neural network architectures using reinforcement learning. ar**v:1611.02167
Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, pp 468–475
Bin Y, Cao X, Chen X, Ge Y, Tai Y, Wang C, Li J, Huang F, Gao C, Sang N (2020) Adversarial semantic data augmentation for human pose estimation. In: European conference on computer vision, pp 1–1
Buehler P, Everingham M, Huttenlocher DP, Zisserman A (2011) Upper body detection and tracking in extended signing sequences. Int J Comput Vis 95(2):180
Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European conference on computer vision. Springer, pp 717–732
Cao X, Ge Y, Tai Y, Zhang W, Li J, Wang C, Li J, Huang F (2019) Anti-confusing: region-aware network for human pose estimation. ar**v:1905.00996
Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742
Charles J, Pfister T, Everingham M, Zisserman A (2014) Automatic and efficient human pose estimation for sign language videos. Int J Comput Vis 110(1):70–90
Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744
Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1212–1221
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Chen Y, Tian Y, He M (2020) Monocular human pose estimation: a survey of deep learning-based methods. Comput Vis Image Understanding 192:102897
Cherian A, Mairal J, Alahari K, Schmid C (2014) Mixing body-part sequences for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2353–2360
Chou CJ, Chien JT, Chen HT (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 17–30
Chu X, Ouyang W, Li H, Wang X (2016a) Structured feature learning for pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4715–4723
Chu X, Ouyang W, Wang X et al (2016b) Crf-cnn: modeling structured information in human pose estimation. In: Advances in neural information processing systems, pp 316–324
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840
Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220
Dang Q, Yin J, Wang B, Zheng W (2019) Deep learning based 2d human pose estimation: a survey. Tsinghua Sci Technol 24(6):663–676
Escalera S, Gonzàlez J, Baró X, Reyes M, Lopes O, Guyon I, Athitsos V, Escalante H (2013) Multi-modal gesture recognition challenge 2013: dataset and results. In: Proceedings of the 15th ACM on international conference on multimodal interaction, pp 445–452
Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1347–1355
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79
Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Fieraru M, Khoreva A, Pishchulin L, Schiele B (2018) Learning to refine human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 205–214
Fischler MA, Elschlager RA (1973) The representation and matching of pictorial structures. IEEE Trans Comput 100(1):67–92
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423
Gkioxari G, Toshev A, Jaitly N (2016) Chained predictions using convolutional neural networks. In: European conference on computer vision. Springer, pp 728–743
Gong W, Zhang X, Gonzàlez J, Sobral A, Bouwmans T, Tu C, Zahzah E (2016) Human pose estimation from monocular images: a comprehensive survey. Sensors 16(12):1966
Gong X, Chen W, Jiang Y, Yuan Y, Liu X, Zhang Q, Li Y, Wang Z (2020) Autopose: searching multi-scale branch aggregation for pose estimation. ar**v:2008.07018
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hou L, Cao J, Zhao Y, Shen H, Meng Y, He R, Ye J (2020) Augmented parallel-pyramid net for attention guided pose-estimation. In: European conference on computer vision, pp 1–1
Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning human pose estimation features with convolutional networks. ar**v:1312.7302
Ji X, Liu H (2009) Advances in view-invariant human motion analysis: a review. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40(1):13–24
Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: bmvc, Citeseer, vol 2, p 5
Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: CVPR 2011. IEEE, pp 1465–1472
Ke L, Chang MC, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Liang X, Gong K, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885
Lifshitz I, Fetaya E, Ullman S (2016) Human pose estimation using deep consensus voting. In: European conference on computer vision. Springer, pp 246–260
Liu H, Simonyan K, Yang Y (2018a) Darts: differentiable architecture search. ar**v:1806.09055
Liu H, Simonyan K, Yang Y (2019) DARTS: differentiable architecture search. In: International conference on learning representations, New Orleans, LA, USA
Liu W, Chen J, Li C, Qian C, Chu X, Hu X (2018b) A cascaded inception of inception network with attention modulated feature fusion for human pose estimation. In: Thirty-Second AAAI conference on artificial intelligence
Liu X, Qi F, Ye W, Cheng K, Guo J, Zheng R (2018c) Analysis and modeling methodologies for heat exchanges of deep-sea in situ spectroscopy detection system based on rov. Sensors 18(8):2729
Liu X, Maghlakelidze G, Zhou J, Izadi OH, Pommerenke D (2020) Detection of esd-induced soft failures by analyzing linux kernel function calls. IEEE Trans Device Mater Reliab PP(99):1–1
Liu Z, Zhu J, Bu J, Chen C (2015) A survey of human pose estimation: the body parts parsing based methods. J Vis Commun Image Representation 32:10–19
Martin Arjovsky S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th international conference on machine learning, Sydney, Australia
Mirowski P, Grimes M, Malinowski M, Hermann KM, Anderson K, Teplyashin D, Simonyan K, Zisserman A, Hadsell R et al (2018) Learning to navigate in cities without a map. In: Advances in neural information processing systems, pp 2419–2430
Moon G, Chang JY, Lee KM (2019) Posefix: model-agnostic general human pose refinement network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7773–7781
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499
Nibali A, He Z, Stuart M, Prendergast L (2018) Numerical coordinate regression with convolutional neural networks. CoRR abs/1801.07372
Nie X, Feng J, Zuo Y, Yan S (2018) Human pose estimation with parsing induced learner. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2100–2108
Ning G, Zhang Z, He Z (2017) Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans Multimed 20(5):1246–1259
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911
Park S, Sb Lee, Park J (2020) Data augmentation method for improving the accuracy of human pose estimation with cropped images. Pattern Recognit Lett 136:244–250
Peng X, Tang Z, Yang F, Feris RS, Metaxas D (2018) Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2226–2234
Pfister T, Simonyan K, Charles J, Zisserman A (2014) Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Asian conference on computer vision. Springer, pp 538–552
Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1913–1921
Pishchulin L, Jain A, Andriluka M, Thormählen T, Schiele B (2012) Articulated people detection and pose estimation: resha** the future. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3178–3185
Rafi U, Leibe B, Gall J, Kostrikov I (2016) An efficient convolutional network for human pose estimation. In: BMVC, vol 1, p 2
Ruggero Ronchi M, Perona P (2017) Benchmarking and error diagnosis in multi-instance pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 369–378
Sapp B, Taskar B (2013) Modec: multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3681
Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3d human pose estimation: a review of the literature and analysis of covariates. Comput Vis Image Understanding 152:1–20
Saxena S, Verbeek J (2016) Convolutional neural fabrics. In: Advances in neural information processing systems, pp 4053–4061
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556
Su H, Yang C, Ferrigno G, De Momi E (2019a) Improved human-robot collaborative control of redundant robot for teleoperated minimally invasive surgery. IEEE Robot Automat Lett 4(2):1447–1453
Su H, Hu Y, Karimi HR, Knoll A, Ferrigno G, De Momi E (2020a) Improved recurrent neural network-based manipulator control with remote center of motion constraints: experimental results. Neural Netw 131:291–299
Su H, Qi W, Yang C, Sandoval J, Ferrigno G, De Momi E (2020b) Deep neural network approach in robot tool dynamics identification for bilateral teleoperation. IEEE Robot Automat Lett 5(2):2943–2949
Su K, Yu D, Xu Z, Geng X, Wang C (2019b) Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5674–5682
Sun K, Lan C, **ng J, Zeng W, Liu D, Wang J (2017a) Human pose estimation using global and local normalization. In: Proceedings of the IEEE international conference on computer vision, pp 5599–5607
Sun K, **ao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5693–5703
Sun X, Shang J, Liang S, Wei Y (2017b) Compositional human pose regression. In: Proceedings of the IEEE international conference on computer vision, pp 2602–2611
Sun X, **ao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European conference on computer vision (ECCV), pp 529–545
Tang W, Wu Y (2019) Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1107–1116
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 648–656
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018a) Rgb-d-based human motion recognition with deep learning: a survey. Comput Vis Image Understanding 171:118–139
Wang X (2013) Intelligent multi-camera video surveillance: a review. Pattern Recognit Lett 34(1):3–19
Wang Z, Li W, Yin B, Peng Q, **ao T, Du Y, Li Z, Zhang X, Yu G, Sun J (2018b) Mscoco keypoints challenge 2018. In: Joint recognition challenge workshop at ECCV 2018, vol 5
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4732
**a F, Wang P, Chen X, Yuille AL (2017) Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6769–6778
Yang F, Chen Y, Pan Z, Zhang M, Xue M, Mo Y, Zhang Y, Guan G, Qian B, **ao Z, et al. (2020) Train your data processor: Distribution-aware and error-compensation coordinate decoding for human pose estimation. ar**v:2007.05887
Yang S, Yang W, Cui Z (2019) Pose neural fabrics search. ar**v:1909.07068
Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3073–3082
Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1281–1290
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011. IEEE, pp 1385–1392
Zhang F, Zhu X, Dai H, Ye M, Zhu C (2020) Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7093–7102
Zhang H, Ouyang H, Liu S, Qi X, Shen X, Yang R, Jia J (2019a) Human pose estimation with spatial contextual information. ar**v:1901.01760
Zhang P, Lan C, **ng J, Zeng W, Xue J, Zheng N (2019b) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Zheng L, Huang Y, Lu H, Yang Y (2019) Pose-invariant embedding for deep person re-identification. IEEE Trans Image Process 28(9):4500–4509
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. ar**v:1611.01578
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, F., Zhu, X. & Wang, C. A Comprehensive Survey on Single-Person Pose Estimation in Social Robotics. Int J of Soc Robotics 14, 1995–2008 (2022). https://doi.org/10.1007/s12369-020-00739-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-020-00739-5