Expressive Telepresence via Modular Codec Avatars

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12357))

Included in the following conference series:

Abstract

VR telepresence consists of interacting with another human in a virtual space represented by an avatar. Today most avatars are cartoon-like, but soon the technology will allow video-realistic ones. This paper aims in this direction, and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset. MCA extends traditional Codec Avatars (CA) by replacing the holistic models with a learned modular representation. It is important to note that traditional person-specific CAs are learned from few training samples, and typically lack robustness as well as limited expressiveness when transferring facial expressions. MCAs solve these issues by learning a modulated adaptive blending of different facial components as well as an exemplar-based latent alignment. We demonstrate that MCA achieves improved expressiveness and robustness w.r.t to CA in a variety of real-world datasets and practical scenarios. Finally, we showcase new applications in VR telepresence enabled by the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wei, S.E., et al.: VR facial animation via multiview image translation. In: SIGGRAPH (2019)

    Google Scholar 

  2. Heymann, D.L., Shindo, N.: Covid-19: what is next for public health? Lancet 395, 542–545 (2020)

    Article  Google Scholar 

  3. Orts-Escolano, S., et al.: Holoportation: virtual 3D teleportation in real-time. In: UIST (2016)

    Google Scholar 

  4. Lombardi, S., Saragih, J., Simon, T., Sheikh, Y.: Deep appearance models for face rendering. In: SIGGRAPH (2018)

    Google Scholar 

  5. Tewari, A., et al.: FML: face model learning from videos. In: CVPR (2019)

    Google Scholar 

  6. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: CVPR (2016)

    Google Scholar 

  7. Elgharib, M., et al.: Egoface: egocentric face performance capture and videorealistic reenactment. ar**v:1905.10822 (2019)

  8. Nagano, K., et al.: PaGAN: real-time avatars using dynamic textures. In: SIGGRAPH (2018)

    Google Scholar 

  9. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. TPAMI 23(6), 681–685 (2001)

    Article  Google Scholar 

  10. Blanz, V., Vetter, T., et al.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)

    Google Scholar 

  11. Tena, J.R., De la Torre, F., Matthews, I.: Interactive region-based linear 3D face models. In: SIGGRAPH (2011)

    Google Scholar 

  12. Neumann, T., Varanasi, K., Wenger, S., Wacker, M., Magnor, M., Theobalt, C.: Sparse localized deformation components. TOG 32(6), 1–10 (2013)

    Article  Google Scholar 

  13. Cao, C., Chai, M., Woodford, O., Luo, L.: Stabilized real-time face tracking via a learned dynamic rigidity prior. TOG 37(6), 1–11 (2018)

    Article  Google Scholar 

  14. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20(3), 413–425 (2013)

    Google Scholar 

  15. Ghafourzadeh, D., et al.: Part-based 3D face morphable model with anthropometric local control. In: EuroGraphics (2020)

    Google Scholar 

  16. Seyama, J., Nagayama, R.S.: The uncanny valley: effect of realism on the impression of artificial human faces. Presence: Teleoper. Virtual Environ. 16(4), 337–351 (2007)

    Article  Google Scholar 

  17. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. ar**v:1312.6114 (2013)

  18. Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. TOG 38(4), 65 (2019)

    Article  Google Scholar 

  19. Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. TOG 33(4), 1–10 (2014)

    Google Scholar 

  20. Li, H., et al.: Facial performance sensing head-mounted display. TOG 34(4), 1–9 (2015)

    Google Scholar 

  21. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)

    Google Scholar 

  22. Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. ar**v:1803.01271 (2018)

  23. Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2014)

    Google Scholar 

  24. Wikipedia: structural similarity. https://en.wikipedia.org/wiki/structural_similarity

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hang Chu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 294 KB)

Supplementary material 2 (mp4 26829 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chu, H., Ma, S., De la Torre, F., Fidler, S., Sheikh, Y. (2020). Expressive Telepresence via Modular Codec Avatars. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58610-2_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58609-6

  • Online ISBN: 978-3-030-58610-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation