Abstract
Recently, the graph convolution networks (GCN) has been widely applied in 3D human pose regression and has showed encouraging performance. One limitation of this method is that it only models the semantic correlation between 2D joints feature, but ignores the variability of the semantic correlation between 2D joints feature. To address this limitation, we propose a non-autoregressive decoding model based on joint classification (JC-NARD), which realizes the 3D joints regression with the method of sequence analysis. The model splits joints into several joint sub-sequences according to the connection and semantic correlation, and then models the correlation between 3D-3D joints feature and 2D-3D joints feature by attention mechanism in each sub-sequence to establish 3D spatial constraint between joints. In order to verify the accuracy and generalization of the model, we combine our model with several 3D human pose regression networks, and the performance of the models are all improved by 1.2–4.5 mm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, A., Triggs, B.: Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2005)
Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3D human pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2262–2271 (2019)
Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: Proceedings 2005 IEEE International Joint Conference on Neural Networks, 2005. vol. 2, pp. 729–734. IEEE (2005)
Gu, J., Bradbury, J., **ong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. ar**v preprint ar**v:1711.02281 (2017)
Huang, L., Tan, J., Liu, J., Yuan, J.: Hand-transformer: non-autoregressive structured modeling for 3D hand pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 17–33. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_2
Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. In: 2011 International Conference on Computer Vision, pp. 2220–2227. IEEE (2011)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ar**v preprint ar**v:1609.02907 (2016)
Li, C., Lee, G.H.: Generating multiple hypotheses for 3D human pose estimation with mixture density network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9887–9895 (2019)
Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9455–9464 (2018)
Liu, J., Liang, Z., Li, Y., Guan, Y., Rojas, J.: A graph attention spatio-temporal convolutional networks for 3D human pose estimation in video. ar**v e-prints pp. ar**v-2003 (2020)
Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S.c., Asari, V.: Attention mechanism exploits temporal contexts: real-time 3D human pose reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5064–5073 (2020)
Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146 (2018)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)
Qiu, Z., Qiu, K., Fu, J., Fu, D.: DGCN: dynamic graph convolutional network for efficient multi-person pose estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11924–11931 (2020)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2008)
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2602–2611 (2017)
Sun, X., **ao, B., Wei, F., Liang, S., Wei, Y.: Integral Human Pose Regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Vaswani, A., et al.: Attention is all you need. ar**v preprint ar**v:1706.03762 (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. ar**v preprint ar**v:1710.10903 (2017)
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264 (2018)
Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Guo, Y., Fu, D., Yang, T. (2021). A Non-autoregressive Decoding Model Based on Joint Classification for 3D Human Pose Regression. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13020. Springer, Cham. https://doi.org/10.1007/978-3-030-88007-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-88007-1_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88006-4
Online ISBN: 978-3-030-88007-1
eBook Packages: Computer ScienceComputer Science (R0)