Learned Monocular Depth Priors in Visual-Inertial Initialization

Zhou, Yunwen; Kar, Abhishek; Turner, Eric; Kowdle, Adarsh; Guo, Chao X.; DuToit, Ryan C.; Tsotsos, Konstantine

doi:10.1007/978-3-031-20047-2_32

Yunwen Zhou¹²,
Abhishek Kar¹²,
Eric Turner¹²,
Adarsh Kowdle¹²,
Chao X. Guo¹²,
Ryan C. DuToit¹² &
…
Konstantine Tsotsos¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13682))

Included in the following conference series:

European Conference on Computer Vision

3200 Accesses
3 Citations

Abstract

Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR and autonomous robotic systems today, in both academia and industry. However, these systems are highly sensitive to the initialization of key parameters such as sensor biases, gravity direction, and metric scale. In practical scenarios where high-parallax or variable acceleration assumptions are rarely met (e.g. hovering aerial robot, smartphone AR user not gesticulating with phone), classical visual-inertial initialization formulations often become ill-conditioned and/or fail to meaningfully converge. In this paper we target visual-inertial initialization specifically for these low-excitation scenarios critical to in-the-wild usage. We propose to circumvent the limitations of classical visual-inertial structure-from-motion (SfM) initialization by incorporating a new learning-based measurement as a higher-level input. We leverage learned monocular depth images (mono-depth) to constrain the relative depth of features, and upgrade the mono-depths to metric scale by jointly optimizing for their scales and shifts. Our experiments show a significant improvement in problem conditioning compared to a classical formulation for visual-inertial initialization, and demonstrate significant accuracy and robustness improvements relative to the state-of-the-art on public benchmarks, particularly under low-excitation scenarios. We further extend this improvement to implementation within an existing odometry system to illustrate the impact of our improved initialization method on resulting tracking trajectories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry

IMU-Assisted Direct Visual-Laser Odometry in Challenging Outdoor Environments

Visual Inertial Odometry with Pentafocal Geometric Constraints

Article 25 July 2018

References

Agarwal, S., Mierle, K., Others: Ceres solver. https://ceres-solver.org
Almalioglu, Y., et al.: SelfVIO: self-supervised deep monocular visual-inertial odometry and depth estimation. CoRR abs/1911.09968 (2019). https://doi.org/arxiv.org/abs/1911.09968
Barron, J.T.: A general and adaptive robust loss function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4331–4339 (2019)
Google Scholar
Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: CodeSLAM-learning a compact, optimisable representation for dense visual slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2560–2568 (2018)
Google Scholar
Burru, M., et al.: The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 35(10), 1157–1163 (2016)
Article Google Scholar
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans. Robot. 37(6), 1874–1890 (2021)
Article Google Scholar
Campos, C., Montiel, J.M.M., Tardós, J.D.: Fast and robust initialization for visual-inertial SLAM. CoRR abs/1908.10653 (2019), https://doi.org/arxiv.org/abs/1908.10653
Campos, C., Montiel, J.M., Tardós, J.D.: Inertial-only optimization for visual-inertial initialization. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 51–57. IEEE (2020)
Google Scholar
Chen, C., Lu, X., Markham, A., Trigoni, N.: IONet: learning to cure the curse of drift in inertial odometry. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Chen, C., et al.: Selective sensor fusion for neural visual-inertial odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10542–10551 (2019)
Google Scholar
Civera, J., Davison, A.J., Montiel, J.M.: Inverse depth parametrization for monocular slam. IEEE Trans. Rob. 24(5), 932–945 (2008)
Article Google Scholar
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: ViNet: visual-inertial odometry as a sequence-to-sequence learning problem. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Concha, A., Civera, J.: RGBDTAM: A cost-effective and accurate RGB-D tracking and map** system. CoRR abs/1703.00754 (2017). https://doi.org/arxiv.org/abs/1703.00754
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Google Scholar
Du, R., et al.: DepthLab: real-time 3D interaction with depth maps for mobile augmented reality. In: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 829–843 (2020)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. CoRR abs/1406.2283 (2014). https://doi.org/arxiv.org/abs/1406.2283
Endres, F., Hess, J., Sturm, J., Cremers, D., Burgard, W.: 3-D map** with an RGB-D camera. IEEE Trans. Rob. 30(1), 177–187 (2013)
Article Google Scholar
Fei, X., Soatto, S.: **vo: an open-source software for visual-inertial odometry (2019). https://doi.org/github.com/ucla-vision/xivo
Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D.: On-manifold preintegration theory for fast and accurate visual-inertial navigation. CoRR abs/1512.02363 (2015). https://doi.org/arxiv.org/abs/1512.02363
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)
Google Scholar
Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. CoRR abs/1904.05822 (2019). https://doi.org/arxiv.org/abs/1904.05822
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Rob. Res. (IJRR) 32(11), 1231–1237 (2013)
Article Google Scholar
Geneva, P., Eckenhoff, K., Lee, W., Yang, Y., Huang, G.: OpenVINS: a research platform for visual-inertial estimation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4666–4672. IEEE (2020)
Google Scholar
Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). https://eigen.tuxfamily.org
Guo, C.X., Roumeliotis, S.I.: IMU-RGBD camera 3D pose estimation and extrinsic calibration: observability analysis and consistency improvement. In: 2013 IEEE International Conference on Robotics and Automation, pp. 2935–2942 (2013). https://doi.org/10.1109/ICRA.2013.6630984
Han, L., Lin, Y., Du, G., Lian, S.: DeepVIO: self-supervised deep learning of monocular visual inertial odometry using 3D geometric constraints. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6906–6913. IEEE (2019)
Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, ISBN: 0521540518 (2004)
Google Scholar
Herath, S., Yan, H., Furukawa, Y.: RoNIN: robust neural inertial navigation in the wild: benchmark, evaluations, and new methods. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3146–3152 (2020). https://doi.org/10.1109/ICRA40945.2020.9196860
Hernandez, J., Tsotsos, K., Soatto, S.: Observability, identifiability and sensitivity of vision-aided inertial navigation. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2319–2325. IEEE (2015)
Google Scholar
Huai, Z., Huang, G.: Robocentric visual-inertial odometry. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6319–6326. IEEE (2018)
Google Scholar
Huang, G.: Visual-inertial navigation: a concise review. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 9572–9582 (2019). https://doi.org/10.1109/ICRA.2019.8793604
Huber, P.J.: Robust estimation of a location parameter. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in statistics, pp. 492–518. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_35
Jones, E., Vedaldi, A., Soatto, S.: Inertial structure from motion with autocalibration. In: Workshop on Dynamical Vision, vol. 25, p. 11 (2007)
Google Scholar
Kaiser, J., Martinelli, A., Fontana, F., Scaramuzza, D.: Simultaneous state initialization and gyroscope bias calibration in visual inertial aided navigation. IEEE Rob. Autom. Lett. 2(1), 18–25 (2017). https://doi.org/10.1109/LRA.2016.2521413
Article Google Scholar
Kelly, J., Sukhatme, G.S.: Visual-inertial sensor fusion: localization, map** and sensor-to-sensor self-calibration. Int. J. Rob. Res. 30(1), 56–79 (2011)
Article Google Scholar
Kopf, J., Rong, X., Huang, J.B.: Robust consistent video depth estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Krasin, I., et al.: OpenImages: a public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://storage.googleapis.com/openimages/web/index.html (2017)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPNP: an accurate o(n) solution to the PNP problem. Int. J. Computer Vis. 81(2), 155 (2009)
Article Google Scholar
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Rob. Res. 34(3), 314–334 (2015)
Article Google Scholar
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Rob. Res. 34(3), 314–334 (2015)
Article Google Scholar
Li, C., Waslander, S.L.: Towards end-to-end learning of visual inertial odometry with an EKF. In: 2020 17th Conference on Computer and Robot Vision (CRV), pp. 190–197. IEEE (2020)
Google Scholar
Li, J., Bao, H., Zhang, G.: Rapid and robust monocular visual-inertial initialization with gravity estimation via vertical edges. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6230–6236 (2019). https://doi.org/10.1109/IROS40897.2019.8968456
Li, M., Mourikis, A.I.: A convex formulation for motion estimation using visual and inertial sensors. In: Proceedings of the Workshop on Multi-View Geometry, held in conjunction with RSS. Berkeley, CA, July 2014
Google Scholar
Li, M., Mourikis, A.I.: High-precision, consistent EKF-based visual-inertial odometry. Int. J. Rob. Res. 32(6), 690–711 (2013)
Article Google Scholar
Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Liu, W., et al.: TLIO: tight learned inertial odometry. IEEE Rob. Autom. Lett. 5(4), 5653–5660 (2020)
Article Google Scholar
Martinelli, A.: Closed-form solution of visual-inertial structure from motion. Int. J. Comput. Vision 106(2), 138–152 (2014)
Article MathSciNet Google Scholar
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017). https://doi.org/10.1109/TRO.2017.2705103
Article Google Scholar
Qin, T., Li, P., Shen, S.: VINS-Mono: a robust and versatile monocular visual-inertial state estimator. CoRR abs/1708.03852 (2017). https://doi.org/arxiv.org/abs/1708.03852
Qin, T., Shen, S.: Robust initialization of monocular visual-inertial estimation on aerial robots. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4225–4232 (2017). https://doi.org/10.1109/IROS.2017.8206284
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. Ar**v preprint (2021)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 44(3), 1623–1637 (2020)
Article Google Scholar
Scaramuzza, D., Fraundorfer, F.: Visual odometry [tutorial]. IEEE Rob. Autom. Mag. 18(4), 80–92 (2011). https://doi.org/10.1109/MRA.2011.943233
Article Google Scholar
Tang, C., Tan, P.: BA-Net: dense bundle adjustment networks. In: International Conference on Learning Representations (2018)
Google Scholar
Troiani, C., Martinelli, A., Laugier, C., Scaramuzza, D.: 2-point-based outlier rejection for camera-IMU systems with applications to micro aerial vehicles. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 5530–5536 (2014). https://doi.org/10.1109/ICRA.2014.6907672
Tsotsos, K., Chiuso, A., Soatto, S.: Robust inference for visual-inertial sensor fusion. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 5203–5210. IEEE (2015)
Google Scholar
Von Stumberg, L., Usenko, V., Cremers, D.: Direct sparse visual-inertial odometry using dynamic marginalization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2510–2517. IEEE (2018)
Google Scholar
Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2043–2050. IEEE (2017)
Google Scholar
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: dense slam without a pose graph. In: Robotics: Science and Systems (2015)
Google Scholar
Wu, K.J., Guo, C.X., Georgiou, G., Roumeliotis, S.I.: Vins on wheels. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5155–5162. IEEE (2017)
Google Scholar
Zuo, X., Merrill, N., Li, W., Liu, Y., Pollefeys, M., Huang, G.: Codevio: visual-inertial odometry with learned optimizable dense depth. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 14382–14388. IEEE (2021)
Google Scholar
Zuñiga-Noël, D., Moreno, F.A., Gonzalez-Jimenez, J.: An analytical solution to the IMU initialization problem for visual-inertial systems. IEEE Rob. Autom. Lett. 6(3), 6116–6122 (2021). https://doi.org/10.1109/LRA.2021.3091407
Article Google Scholar

Download references

Acknowledgements

We thank Josh Hernandez and Maksym Dzitsiuk for their support in develo** our real-time system implementation.

Author information

Authors and Affiliations

Google AR, Mountain View, USA
Yunwen Zhou, Abhishek Kar, Eric Turner, Adarsh Kowdle, Chao X. Guo, Ryan C. DuToit & Konstantine Tsotsos

Authors

Yunwen Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Kar
View author publications
You can also search for this author in PubMed Google Scholar
Eric Turner
View author publications
You can also search for this author in PubMed Google Scholar
Adarsh Kowdle
View author publications
You can also search for this author in PubMed Google Scholar
Chao X. Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ryan C. DuToit
View author publications
You can also search for this author in PubMed Google Scholar
Konstantine Tsotsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunwen Zhou .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 248 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y. et al. (2022). Learned Monocular Depth Priors in Visual-Inertial Initialization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-20047-2_32
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20046-5
Online ISBN: 978-3-031-20047-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learned Monocular Depth Priors in Visual-Inertial Initialization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry

IMU-Assisted Direct Visual-Laser Odometry in Challenging Outdoor Environments

Visual Inertial Odometry with Pentafocal Geometric Constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 248 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learned Monocular Depth Priors in Visual-Inertial Initialization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry

IMU-Assisted Direct Visual-Laser Odometry in Challenging Outdoor Environments

Visual Inertial Odometry with Pentafocal Geometric Constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 248 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation