Abstract
Networked 3D virtual environments allow multiple users to interact over the Internet by means of avatars and to get some feeling of a virtual telepresence. However, avatar control may be tedious. 3D sensors for motion capture systems based on 3D sensors have reached the consumer market, but webcams remain more widespread and cheaper. This work aims at animating a user’s avatar by real-time motion capture using a personal computer and a plain webcam. In a classical model-based approach, we register a 3D articulated upper-body model onto video sequences and propose a number of heuristics to accelerate particle filtering while robustly tracking user motion. Describing the body pose using wrists 3D positions rather than joint angles allows efficient handling of depth ambiguities for probabilistic tracking. We demonstrate experimentally the robustness of our 3D body tracking by real-time monocular vision, even in the case of partial occlusions and motion in the depth direction.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig11_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig12_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig13_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig14_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig15_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig16_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig17_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig18_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig19_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig20_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig21_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig22_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig23_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig24_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig25_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig26_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig27_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig28_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig29_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig30_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig31_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig32_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig33_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig34_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig35_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig36_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig37_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig38_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-017-0861-3/MediaObjects/138_2017_861_Fig39_HTML.jpg)
Similar content being viewed by others
References
Agarwal, A., Triggs, B.: Recovering 3D human pose from monocular image. IEEE Trans. Pattern Anal. Mach.Intell. 12, 44–58 (2006)
Andriluka, M., Roth, S., Schiele, B.: Monocular 3D pose estimation and tracking by detection. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 623–630, San Francisco, California (2010)
Balan, A.O., Sigal, L., Black, M.J.: A quantitative evaluation of video-based 3D person tracking. In: Proceedings of ICCCN 2005, pp. 349–356 (2005)
Bernier, O., Cheung-Mon-Chang, P.: Real-time 3D articulated pose tracking using particle filtering and belief propagation on factor graphs. BMVC 1, 27–36 (2006)
Bernier, O., Cheung-Mon-Chang, P., Bouguet, A.: Fast nonparametric belief propagation for real-time stereo articulated body tracking. Comput. Vis. Image Underst. 113(1), 29–47 (2009)
Borgefors, G.: Distance transformations in digital images. Comput. Vis. Graph. Image Process. 34, 344–371 (1986)
Bregler, C., Malik, J., Pullen, K.: Twist based acquisition and tracking of animal and human kinematics. Int. J. Comput. Vis. 56, 179–194 (2004)
CMU: Carnegie mellon university graphics lab, motion capture database. http://mocap.cs.cmu.edu/ (2017)
Delamarre, Q., Faugeras, O.: 3D articulated models and multiview tracking with physical forces. J. Comput. Vis. Image Underst. 81(3), 328–357 (2001)
Deriche, R.: Fast algorithms for low-level vision. IEEE Trans. Pattern Anal. Mach. Intell. 12, 78–87 (1990)
Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 126–133 (2000)
Deutscher, J., North, B., Bascle, B., Blake, A.: Tracking through singularities and discontinuities by random sampling. In: ICCV, pp. 1144–1149 (1999)
Elgammal, A.M., Lee, C.S.: Inferring 3D body pose from silhouettes using activity manifold learning. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR04), pp. 681–688 (2004)
Fontmarty, M., Lerasle, F., Danès, P.: Data fusion within a modified annealed particle filter dedicated to human motion capture. In: In International Conference on Intelligent Robots and Systems, San Diego, CA, USA (2007)
Gelencsér-Horváth, A., Tornai, G., Horváth, A., Cserey, G.: Fast, parallel implementation of particle filtering on the gpu architecture. EURASIP J. Adv. Signal Process. 148, 1–16 (2013)
Gonczarek, A., Tomczak, J.M.: Articulated tracking with manifold regularized particle filter. Mach. Vis. Appl. 27(2), 275–286 (2016)
HANIM: Hanim specification. http://www.h-anim.org/ (2017)
Hartmann, B., Mancini, M., Pelachaud, C.: Implementing expressive gesture synthesis for embodied conversational agents. In: Gesture Workshop. LNAI. Springer (2005)
Hauberg, S.: Three dimensional monocular human motion analysis in end-effector space. In: EMMCVPR 2009. Lecture Notes in Computer Science, pp. 235–248. Springer (2009)
Horain, P., Soares, J.M., Kumar, P., Bideau, A.: Virtually enhancing the perception of user actions. In: 15th International Conference on Artificial Reality and Telexistence (ICAT 2005), p. 245246, Christchurch, New Zealand (2005)
Howe, N.R.: A recognition-based motion capture baseline on the HumanEva II test data. Mach. Vis. Appl. 22(6), 995–1008 (2011)
Hua, G., Wu, Y.: A decentralized probabilistic approach to articulated body tracking. J. Comput. Vis. Image Underst. 108(3), 272–283 (2007)
I-Maginer: Open source platform for 3D environments. http://www.openspace3d.com/ (2010)
Isard, M., Blake, A.: Condensation—conditional density propagation for visual tracking. IJCV Int. J. Comput. Vis. 29, 5–28 (1998)
ISO/IEC:: Information technology-coding of audio-visual objects-part 2: visual international standard 14996-2 (2001)
Jáuregui, D.A.G., Horain, P.: Region-based vs. edge-based registration for 3D motion capture by real time monoscopic vision. In: A. Gagalowicz, L.. W. Philips (eds.) Proceedings of MIRAGE 2009, pp. 344–355. INRIA Rocquencourt, France (2009)
Jáuregui, D.A.G., Horain, P.: Real-time 3D motion capture by monocular vision and virtual rendering. In: A. Fusiello, V. Murino, L.V.. R. Cucchiara (editors) (eds.) Computer Vision ECCV 2012. Workshops and Demonstrations, pp. 663–666, Florence, Italy (2012)
Kalman, R.: A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82D, 34–45 (1960)
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Lee, M.W., Cohen, I., Jung, S.K.: Particle filter with analytical inference for human body tracking. In: IEEE Workshop on Motion and Video Computing, pp. 159–165 (2002)
Lenz, C., Panin, G., Knoll, A.: A gpu-accelerated particle filter with pixel-level likelihood. In: In International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany (2008)
Lindner, M., Schiller, I., Kolb, A., Koch, R.: Time-of-flight sensor calibration for accurate range sensing. Comput. Vis. Image Underst. 114(12), 1318–1328 (2010)
Lopez, F., Zhang, L., Mok, A.K., Beaman, J.: Particle filtering on gpu architectures for manufacturing applications. Comput. Ind. 71, 116–127 (2015)
Lozano, O.M., Otsuka, K.: Real-time visual tracker by stream processing. J. Signal Process. Syst. 57(2), 285–295 (2009)
Lu, Z., Carreira-Perpinan, M., Sminchisescu, C.: People tracking with the Laplacian Eigenmaps latent variable model. Adv. Neural Inf. Process. Syst. 20, 1705–1712 (2008)
MacCormick, J., Isard, M.: Partitioned sampling, articulated objects, and interface-quality hand tracking. In: European Conference on Computer Vision, vol. 2, pp. 3–19, Dublin, Irlande (2000)
Marques-Soares, J., Horain, P., Bideau, A., Nguyen, M.: Acquisition 3D du geste par vision monoscopique en temps réel et téléprésence. In: Actes de l’atelier Acquisition du geste humain par vision artificielle et applications, pp. 23–27, Toulouse (2004)
Microsoft: Kinect - xbox.com. http://www.xbox.com/en-US/Kinect (2017)
Moeslund, T., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. Int. J. Comput. Vis. Image Underst. (CVIU’06) 104, 90126 (2006)
Montemayor, A.S., Pantrigo, J.J., Cabido, R., Payne, B.: Bandwidth-improved gpu particle filter for visual tracking. In: In proceedings of the Ibero-American Symposium on Computer Graphics—SIACG (2006), Santiago de Compostela, Spain (2006)
Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7, 208–313 (1965)
Ning, H., Tan, T., Wang, L., Hu, W.: People tracking based on motion model and motion constraints with automatic initialization. Pattern Recognit. 37(7), 1423–1440 (2004)
Niskanen, M., Boyer, E., Horaud, R.: Articulated motion capture from 3-D points and normals. In: British Machine Vision Conference, pp. 439–448, Oxford, United Kingdom (2005)
Noriega, P., Bernier, O.: Multicues 3D monocular upper body tracking using constrained belief propagation. In: British Machine Vision Conference, vol. 2, pp. 10–13, Warwick, United Kingdom (2007)
Ouhaddi, H., Horain, P.: Hand tracking by 3D model registration. In: Subsol, G. (ed.) Colloque Scientifique International Ralit virtuelle et prototypage, pp. 51–59, Laval, France (1999)
Poppe, R.W.: Vision-based human motion analysis: an overview. Comput. Vis. Image Underst. 108, 4–18 (2007)
Raskar, R.: Hardware support for non-photorealistic rendering. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, pp. 41–46. ACM Press (2001)
Rius, I., Gonzilez, J., Varona, J., Roca, F.X.: Action-specific motion prior for efficient bayesian 3D human body tracking. Pattern Recognit. 42(11), 2907–2921 (2009)
Rohr, K.: Towards model-based recognition of human movements in image sequences. CVGIP Image Underst. 59(1), 94–115 (1994)
Saboune, J., Charpillet, F.: Using interval particle filtering for marker less 3D human motion capture. In: IEEE International Conference on Tools with Artificial Intelligence, pp. 621–627 (2005)
Shoemake, K.: Graphic Gems IV. Academic Press, Cambridge (1994). ISBN:0123361657
Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1—-2), 4–27 (2010). doi:10.1007/s11263-009-0273-6
Sigal, L., Bhatia, S., Roth, S., Black, M., Isard, M.: Tracking loose-limbed people. In: Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 421–428 (2004)
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Discriminative density propagation for 3D human motion estimation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR05), vol. 1, pp. 390–397, San Diego, CA (2005)
Sminchisescu, C., Triggs, B.: Covariance scaled sampling for monocular 3D body tracking. In: Conference on Computer Vision and Pattern Recognition, Hawaii (2001)
Sminchisescu, C., Triggs, B.: Kinematic jump processes for monocular 3D human tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 69–76, Madison, WI (2003)
Tolani, D., Goswami, A., Badler, N.: Real-time inverse kinematics techniques for anthropomorphic limbs. Graphical Models and Image Process in archive 62, 353–388 (2000)
Toyama, K., Blake, A.: Probabilistic tracking with exemplars in a metric space. Int. J. Comput. Vis. 48(1), 9–19 (2002)
Urtasun, R., Fleet, D.J., Fua, P.: 3D people tracking with gaussian process dynamical models. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR06), vol. 1, p. 238245, New York, NY (2006)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. IEEE Comput. Vis. Pattern Recognit. 1, 511–518 (2001)
Wright, R.S.J., Lipchak, B., Haemel, N.: OpenGL SuperBible: Comprehensive Tutorial and Reference, 4rth edn. Addison-Wesley Professional, Michigan (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jáuregui, D.A.G., Horain, P. Real-time 3D motion capture by monocular vision and virtual rendering. Machine Vision and Applications 28, 839–858 (2017). https://doi.org/10.1007/s00138-017-0861-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-017-0861-3