Abstract
Virtual Reality (VR) enables users to interact with a simulated environment through a head-mounted device (HMD) and two hand controllers. These help in tracking the user’s head and hand locations and movements but have no references to the rest of the body. Though these movements can be tracked using additional equipment like trackers and bodysuits, being tightly coupled with hardware would make this an inconvenient as well as an expensive option. Also, other existing computer vision solutions require calibration and expensive depth-based cameras. In this paper, we present a novel approach of using sensor fusion techniques integrated with computer vision algorithms as an alternate solution to position human joints in the virtual environment and simulate their movements. The human landmark identification and pose estimation are achieved using ML algorithms for computer vision. These landmarks are mapped to an avatar in virtual space using geometric transformations and inverse kinematics. The use of sensor fusion ensures the correctness of scale and transform operations. The HMD position helps pivot the estimated pose and set the view for the user. The hand controller feed is used to control and verify the position of the hand with respect to the upper body. Through sensor fusion, we can bypass the complicated setup required by other Computer Vision (CV) approaches. The solution is built using open source tools and frameworks and the rest of the paper discussed the related work, design overview, and results in detail.
Supported by organization University of Southern California.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Glauser, O., Panozzo, D., Hilliges, O., Sorkine-Hornung, O.: Deformation capture via soft and stretchable sensor arrays. ACM Trans. Graph. 38(2), 1–16 (2019). https://doi.org/10.1145/3311972
Huang, Y., Kaufmann, M., Aksan, E., Black, M., Hilliges, O., Pons-Moll, G.: Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time (2022). ar**v preprint ar**v:1810.04703. Retrieved 12 Feb 2022
Adapting motion capture data using weighted real-time inverse. Accessed 11 Feb 2022. https://dl.acm.org/doi/abs/10.1145/1057270.1057281
Roth, D., Lugrin, J.-L., Büser, J., Bente, G., Fuhrmann, A., Latoschik, M.E.: A simplified inverse kinematic approach for embodied VR applications. IEEE Virt. Real. (VR) 2016, 275–276 (2016). https://doi.org/10.1109/VR.2016.7504760
Grochow, K., Martin, S.L., Hertzmann, A., Popović, Z.: Style-based inverse kinematics. ACM Trans. Graph. 23(3), 522–531 (2004). https://doi.org/10.1145/1015706.1015755
Full-body performance animation with sequential inverse. Accessed 11 Feb 2022. https://www.sciencedirect.com/science/article/pii/S1524070308000040
Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37(4), 14 (2018). https://doi.org/10.1145/3197517.3201311
Yang, D., Kim, D., Lee, S.-H.: LoBSTr: real-time lower-body pose prediction from sparse upper-body tracking signals. Comput. Graph. Forum 40, 265–275 (2021). https://doi.org/10.1111/cgf.142631
Chung, S.: Hand pose estimation and prediction for virtual reality applications (2021). https://doi.org/10.1184/R1/16860148.v1
Hou, X., Zhang, J., Budagavi, M., Dey, S.: Head and body motion prediction to enable mobile VR experiences with low latency. In 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–7. IEEE Press (2019). https://doi.org/10.1109/GLOBECOM38437.2019.9014097
Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 43, no. 01, pp. 172–186 (2021). https://doi.org/10.1109/TPAMI.2019.2929257
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. Ar**v, abs/1803.01271 (2018)
Lea, C.S., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.: Temporal convolutional networks for action segmentation and detection. IEEE Conf. Comput. Vis. Patt. Recogn. (CVPR) 2017, 1003–1012 (2017)
Temporal convolutional networks: a unified approach to action. Accessed Feb 11 2022. https://arxiv.org/abs/1608.08242
COCO Dataset. https://cocodataset.org/
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Jalapati, P., Naraparaju, S., Yao, P., Zyda, M. (2022). Integrating Sensor Fusion with Pose Estimation for Simulating Human Interactions in Virtual Reality. In: Chen, J.Y.C., Fragomeni, G., Degen, H., Ntoa, S. (eds) HCI International 2022 – Late Breaking Papers: Interacting with eXtended Reality and Artificial Intelligence. HCII 2022. Lecture Notes in Computer Science, vol 13518. Springer, Cham. https://doi.org/10.1007/978-3-031-21707-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-21707-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21706-7
Online ISBN: 978-3-031-21707-4
eBook Packages: Computer ScienceComputer Science (R0)