Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2020 (ICANN 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12396))

Included in the following conference series:

Abstract

In 3D human pose estimation one of the biggest problems is the lack of large, diverse datasets. This is especially true for multi-person 3D pose estimation, where, to our knowledge, there are only machine generated annotations available for training. To mitigate this issue, we introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion. Due to the existence of cheap sensors, videos with depth maps are widely available, and our method can exploit a large, unannotated dataset. Our algorithm is a monocular, multi-person, absolute pose estimator. We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates. Also, our model achieves state-of-the-art results on the MuPoTS-3D dataset by a considerable margin. Our code will be publicly available (https://github.com/vegesm/wdspose).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 85.59
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 106.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. ar**v preprint, ar**v:1607.06450 (2016)

  2. Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: CVPR, pp. 1669–1676, June 2014

    Google Scholar 

  3. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 678–694. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_41

    Chapter  Google Scholar 

  4. Dabral, R., Gundavarapu, N.B., Mitra, R., Sharma, A., Ramakrishnan, G., Jain, A.: Multi-person 3D human pose estimation from monocular images. In: 3DV (2019)

    Google Scholar 

  5. Drover, D., Mv, R., Chen, C.H., Agrawal, A., Tyagi, A., Huynh, C.P.: Can 3D pose be learned from 2D projections alone? In: ECCV Workshops, pp. 78–94 (2019)

    Google Scholar 

  6. Fang, H.S., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: AAAI (2018)

    Google Scholar 

  7. Firman, M.: RGBD datasets: past, present and future. In: CVPR Workshops, June 2016

    Google Scholar 

  8. Geman, S., McClure, D.E.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. 52(4), 5–21 (1987)

    MathSciNet  Google Scholar 

  9. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV, October 2017

    Google Scholar 

  10. Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3D pose estimation (2017)

    Google Scholar 

  11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)

    Google Scholar 

  12. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)

    Google Scholar 

  13. Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015)

    Google Scholar 

  14. Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3D human pose using multi-view geometry. In: CVPR, June 2019

    Google Scholar 

  15. Li, C., Lee, G.H.: Generating multiple hypotheses for 3D human pose estimation with mixture density network. In: CVPR, June 2019

    Google Scholar 

  16. Li, S., Chan, A.B.: 3D human pose estimation from monocular images with deep convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 332–347. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16808-1_23

    Chapter  Google Scholar 

  17. Li, Z., Snavely, N.: Megadepth: learning single-view depth prediction from internet photos. In: CVPR (2018)

    Google Scholar 

  18. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV, pp. 2659–2668 (2017)

    Google Scholar 

  19. Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV, pp. 506–516 (2017)

    Google Scholar 

  20. Mehta, D., et al.: Single-shot multi-person 3D pose estimation from monocular RGB. In: 3DV, September 2018

    Google Scholar 

  21. Moon, G., Chang, J.Y., Lee, K.M.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV, October 2019

    Google Scholar 

  22. Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3D human pose estimation. In: CVPR (2018)

    Google Scholar 

  23. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR, pp. 1263–1272 (2017)

    Google Scholar 

  24. Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_41

    Chapter  Google Scholar 

  25. Rhodin, H., Salzmann, M., Fua, P.: Unsupervised geometry-aware representation for 3D human pose estimation. In: ECCV (2018)

    Google Scholar 

  26. Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net: localization-classification-regression for human pose. In: CVPR, pp. 1216–1224, July 2017

    Google Scholar 

  27. Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. TPAMI 42(5), 1146–1161 (2019)

    Google Scholar 

  28. Sigal, L., Balan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV 87(1), 4 (2009)

    Google Scholar 

  29. Sun, K., **ao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)

    Google Scholar 

  30. Sun, X., **ao, B., Liang, S., Wei, Y.: Integral human pose regression. In: ECCV, pp. 529–545, September 2018

    Google Scholar 

  31. Véges, M., Lőrincz, A.: Absolute human pose estimation with depth prediction network. In: IJCNN, pp. 1–7, July 2019

    Google Scholar 

  32. Véges, M., Varga, V., Lőrincz, A.: 3D human pose estimation with siamese equivariant embedding. Neurocomputing 339, 194–201 (2019)

    Article  Google Scholar 

  33. Wan, C., Probst, T., Gool, L.V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: CVPR, pp. 10853–10862 (2019)

    Google Scholar 

  34. Wang, L., et al.: Generalizing monocular 3D human pose estimation in the wild. In: ICCV Workshops, October 2019

    Google Scholar 

  35. Wasenmüller, O., Stricker, D.: Comparison of Kinect V1 and V2 depth images in terms of accuracy and precision. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10117, pp. 34–45. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54427-4_3

    Chapter  Google Scholar 

  36. Yuan, S., Stenger, B., Kim, T.K.: RGB-based 3D hand pose estimation via privileged learning with depth images (2018)

    Google Scholar 

  37. Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3D pose and shape estimation of multiple people in natural scenes the importance of multiple scene constraints. In: CVPR, pp. 2148–2157 (2018)

    Google Scholar 

  38. Zanfir, A., Marinoiu, E., Zanfir, M., Popa, A.I., Sminchisescu, C.: Deep network for the integrated 3D sensing of multiple people in natural images. In: NIPS, pp. 8410–8419 (2018)

    Google Scholar 

  39. Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: CVPR, pp. 4447–4455 (2015)

    Google Scholar 

  40. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV, pp. 398–407, October 2017

    Google Scholar 

Download references

Acknowledgment

MV received support from the European Union and co-financed by the European Social Fund (EFOP-3.6.3-16-2017-00002). AL was supported by the National Research, Development and Innovation Fund of Hungary via the Thematic Excellence Programme funding scheme under Project no. ED_18-1-2019-0030 titled Application-specific highly reliable IT solutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Márton Véges .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Véges, M., Lőrincz, A. (2020). Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12396. Springer, Cham. https://doi.org/10.1007/978-3-030-61609-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61609-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61608-3

  • Online ISBN: 978-3-030-61609-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation