Unsupervised Human Pose Estimation on Depth Images

Blanc-Beyne, Thibault; Carlier, Axel; Mouysset, Sandrine; Charvillat, Vincent

doi:10.1007/978-3-030-67667-4_22

Thibault Blanc-Beyne^11,12,
Axel Carlier¹²,
Sandrine Mouysset¹² &
…
Vincent Charvillat¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12460))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1477 Accesses

Abstract

Human pose estimation is a widely studied problem in the field of computer vision that consists in regressing body joints coordinates from an image. Most state-of-the-art techniques rely on RGB or RGB-D data, but driven by an industrial use-case to prevent musculoskeletal disorders, we focus on estimating human pose based on depth images only. In this paper, we propose an approach for predicting 3D human pose in challenging depth images using an image-to-image translation mechanism. As our dataset only consists in unlabelled data, we generate an annotated set of synthetic depth images using a human3D model that provides geometric features of the pose. To fit the challenging nature of our real depth images as closely as possible, we first refine the synthetic depth images with an image-to-image translation approach using a modified CycleGAN. This architecture is trained to render realistic depth images using synthetic depth images while preserving the human pose. We then use labels from our synthetic data paired to the realistic outputs of the CycleGAN to train a convolutional neural network for pose estimation. Our experiments show that the proposed unsupervised framework achieves good results on both usual and challenging datasets.

T. Blanc-Beyne—This work was supported by CIFRE ANRT 2017/0311.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Investigating Depth Domain Adaptation for Efficient Human Pose Estimation

Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation

Image-Based Synthesis for Deep 3D Human Pose Estimation

Article 19 March 2018

Notes

References

Aleksynska, M., Berg, J., Foden, D., Johnston, H.E.S., Parent-Thirion, A., Vanderleyden, J.: Working conditions in a global perspective. Publications Office of the European Union (2019)
Google Scholar
Alp Güler, R., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306. IEEE (2018)
Google Scholar
Aytar, Y., Castrejon, L., Vondrick, C., Pirsiavash, H., Torralba, A.: Cross-modal scene networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2303–2314 (2017)
Article Google Scholar
Banerjee, T., et al.: Monitoring hospital rooms for safety using depth images. AI for Gerontechnology (2012)
Google Scholar
Blanc-Beyne, T., Carlier, A., Charvillat, V.: Iterative dataset filtering for weakly supervised segmentation of depth images. In: 2019 IEEE International Conference on Image Processing, pp. 1515–1519. IEEE (2019)
Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299. IEEE (2017)
Google Scholar
Chan, C., Ginosar, S., Zhou, T., Efros, A.A.: Everybody dance now. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5933–5942. IEEE (2019)
Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: The IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2018)
Google Scholar
Fang, H.S., **e, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343. IEEE (2017)
Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Scene parsing with multiscale feature learning, purity trees, and optimal covers. In: Proceedings of the 29th International Conference on Machine Learning (2012)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Towards viewpoint invariant 3D human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 160–177. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_10
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134. IEEE (2017)
Google Scholar
James, S., et al.: Sim-to-real via sim-to-sim: data-efficient robotic gras** via randomized-to-canonical adaptation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12627–12637. IEEE (2019)
Google Scholar
Jiu, M., Wolf, C., Taylor, G., Baskurt, A.: Human body part estimation from depth images via spatially-constrained deep learning. Pattern Recogn. Lett. 50, 122–129 (2014)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Li, Y., Yuan, L., Vasconcelos, N.: Bidirectional learning for domain adaptation of semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6936–6945. IEEE (2019)
Google Scholar
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp. 700–708 (2017)
Google Scholar
Mao, X., Li, Q., **e, H., Lau, R.Y., Wang, Z.: Multi-class generative adversarial networks with the L2 loss function. ar**v preprint ar**v:1611.04076 (2016)
Marín-Jiménez, M.J., Romero-Ramirez, F.J., Muñoz-Salinas, R., Medina-Carnicer, R.: 3D human pose estimation from depth maps using a deep combination of poses. J. Vis. Commun. Image Represent. 55, 627–639 (2018)
Article Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. ar**v preprint ar**v:1411.1784 (2014)
Moon, G., Yong Chang, J., Mu Lee, K.: V2v-PoseNet: voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088. IEEE (2018)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Punnett, L., Wegman, D.H.: Work-related musculoskeletal disorders: the epidemiologic evidence and the debate. J. Electromyogr. Kinesiol. 14(1), 13–23 (2004)
Article Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (2016)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Google Scholar
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297–1304. IEEE (2011)
Google Scholar
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2107–2116. IEEE (2017)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. ar**v preprint ar**v:1607.08022 (2016)
Wang, K., Zhai, S., Cheng, H., Liang, X., Lin, L.: Human pose estimation from depth images via inference embedded multi-task learning. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1227–1236. ACM (2016)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807. IEEE (2018)
Google Scholar
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915. IEEE (2017)
Google Scholar
Zhang, X., Wong, Y., Kankanhalli, M.S., Geng, W.: Unsupervised domain adaptation for 3D human pose estimation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 926–934. ACM (2019)
Google Scholar
Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 186–201. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_17
Chapter Google Scholar
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative Visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36
Chapter Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision. IEEE (2017)
Google Scholar
Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Ebhys, ZA La Cigalière III, 84250, Le Thor, France
Thibault Blanc-Beyne
Université de Toulouse - IRIT, 2 rue Charles Camichel, 31079, Toulouse, France
Thibault Blanc-Beyne, Axel Carlier, Sandrine Mouysset & Vincent Charvillat

Authors

Thibault Blanc-Beyne
View author publications
You can also search for this author in PubMed Google Scholar
Axel Carlier
View author publications
You can also search for this author in PubMed Google Scholar
Sandrine Mouysset
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Charvillat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thibault Blanc-Beyne .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Yuxiao Dong
Jožef Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Amazon Alexa Knowledge, Cambridge, UK
Craig Saunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blanc-Beyne, T., Carlier, A., Mouysset, S., Charvillat, V. (2021). Unsupervised Human Pose Estimation on Depth Images. In: Dong, Y., Mladenić, D., Saunders, C. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12460. Springer, Cham. https://doi.org/10.1007/978-3-030-67667-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-67667-4_22
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67666-7
Online ISBN: 978-3-030-67667-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Unsupervised Human Pose Estimation on Depth Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating Depth Domain Adaptation for Efficient Human Pose Estimation

Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation

Image-Based Synthesis for Deep 3D Human Pose Estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Unsupervised Human Pose Estimation on Depth Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating Depth Domain Adaptation for Efficient Human Pose Estimation

Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation

Image-Based Synthesis for Deep 3D Human Pose Estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation