Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity

Anisimovskiy, Valery; Shcherbinin, Andrey; Turko, Sergey; Kurilin, Ilya

doi:10.1007/978-3-030-47358-7_4

Valery Anisimovskiy ORCID: orcid.org/0000-0001-9134-1078¹⁰,
Andrey Shcherbinin¹⁰,
Sergey Turko¹⁰ &
…
Ilya Kurilin¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12109))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

2447 Accesses

Abstract

We present an unsupervised learning method for the task of monocular depth estimation. In common with many recent works, we leverage convolutional neural network (CNN) training on stereo pair images with view reconstruction as a self-supervisory signal. In contrast to the previous work, we employ a stereo camera parameters estimation network to make our model robust to training data diversity. Another of our contributions is the introduction of self-supervision correction. With it we address one of the serious drawbacks of the stereo pair self-supervision in the unsupervised monocular depth estimation approach: at later training stages, self-supervision by view reconstruction fails to improve predicted depth map due to various ambiguities in the input images. We mitigate this problem by making depth estimation CNN produce both depth map and correction map used to modify the input stereo pair images in the areas of ambiguity. Our contributions allow us to achieve state-of-the-art results on the KITTI driving dataset (among unsupervised methods) by training our model on hybrid city driving dataset.

V. Anisimovskiy and A. Shcherbinin—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Thailand)

eBook: EUR 42.79; Price includes VAT (Thailand)

Softcover Book: EUR 49.99; Price excludes VAT (Thailand)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Monocular Depth by Distilling Cross-Domain Stereo Networks

Reversing the Cycle: Self-supervised Deep Stereo Through Enhanced Monocular Distillation

Dual CNN Models for Unsupervised Monocular Depth Estimation

Notes

1.
https://github.com/mrharicot/monodepth.

References

Chen, L., Tang, W., John, N.: Self-supervised monocular image depth learning and confidence estimation. ar**v preprint ar**v:1803.05530 (2018)
Chen, T., et al.: Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. ar**v preprint ar**v:1512.01274 (2015)
Chen, W., Deng, J.: Learning single-image depth from videos using quality assessment networks. ar**v preprint ar**v:1806.09573 (2018)
Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: Advances in Neural Information Processing Systems, pp. 730–738 (2016)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)
Google Scholar
Dharmasiri, T., Spek, A., Drummond, T.: Eng: end-to-end neural geometry for robust depth and pose estimation using CNNs. ar**v preprint ar**v:1807.05705 (2018)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp. 2650–2658 (2015)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp. 2366–2374 (2014)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)
Google Scholar
Fu, H., Gong, M., Wang, C., Tao, D.: A compromise principle in deep monocular depth estimation. ar**v preprint ar**v:1708.08267 (2017)
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.: Digging into self-supervised monocular depth estimation. ar**v preprint ar**v:1806.01260 (2018)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611. IEEE (2017)
Google Scholar
He, L., Wang, G., Hu, Z.: Learning depth from single images with deep neural network embedding focal length. IEEE Trans. Image Process. 27(9), 4676–4689 (2018)
Article MathSciNet Google Scholar
Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. ar**v preprint ar**v:1803.08673 (2018)
Jiang, H., Learned-Miller, E., Larsson, G., Maire, M., Shakhnarovich, G.: Self-supervised depth learning for urban scene understanding. ar**v preprint ar**v:1712.04850 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ar**v preprint ar**v:1412.6980 (2014)
Kundu, J.N., Uppala, P.K., Pahuja, A., Babu, R.V.: Adadepth: unsupervised content congruent adaptation for depth estimation. ar**v preprint ar**v:1803.01599 (2018)
Kuznietsov, Y., Stückler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017)
Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)
Google Scholar
Li, B., Dai, Y., He, M.: Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference. Pattern Recogn. 83, 328–339 (2018)
Article Google Scholar
Li, Z., Snavely, N.: Megadepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2041–2050 (2018)
Google Scholar
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)
Google Scholar
Luo, Y., et al.: Single view stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 155–163 (2018)
Google Scholar
Poggi, M., Aleotti, F., Tosi, F., Mattoccia, S.: Towards real-time unsupervised monocular depth estimation on CPU. ar**v preprint ar**v:1806.11430 (2018)
Poggi, M., Tosi, F., Mattoccia, S.: Learning monocular depth estimation with unsupervised trinocular assumptions. ar**v preprint ar**v:1808.01606 (2018)
Ranjan, A., Jampani, V., Kim, K., Sun, D., Wulff, J., Black, M.J.: Adversarial collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. ar**v preprint ar**v:1805.09806 (2018)
Repala, V.K., Dubey, S.R.: Dual CNN models for unsupervised monocular depth estimation. ar**v preprint ar**v:1804.06324 (2018)
Tang, C., Tan, P.: Ba-net: Dense bundle adjustment network. ar**v preprint ar**v:1806.04807 (2018)
Wang, A., Fang, Z., Gao, Y., Jiang, X., Ma, S.: Depth estimation of video sequences with perceptual losses. IEEE Access 6, 30536–30546 (2018)
Article Google Scholar
Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2022–2030 (2018)
Google Scholar
**an, K., et al.: Monocular relative depth perception with web stereo data supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 311–320 (2018)
Google Scholar
**e, J., Girshick, R., Farhadi, A.: Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 842–857. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_51
Chapter Google Scholar
Yang, N., Wang, R., Stückler, J., Cremers, D.: Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry. ar**v preprint ar**v:1807.02570 (2018)
Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R.: Every pixel counts: unsupervised geometry learning with holistic 3D motion understanding. ar**v preprint ar**v:1806.10556 (2018)
Yin, Z., Shi, J.: Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2018)
Google Scholar
Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 340–349 (2018)
Google Scholar
Zheng, C., Cham, T.J., Cai, J.: T2net: synthetic-to-realistic translation for solving single-image depth estimation tasks. ar**v preprint ar**v:1808.01454 (2018)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Samsung R&D Institute Russia, Moscow, Russia
Valery Anisimovskiy, Andrey Shcherbinin, Sergey Turko & Ilya Kurilin

Authors

Valery Anisimovskiy
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Shcherbinin
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Turko
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Kurilin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valery Anisimovskiy .

Editor information

Editors and Affiliations

National Research Council Canada, Ottawa, ON, Canada
Cyril Goutte
Queen’s University, Kingston, ON, Canada
**aodan Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anisimovskiy, V., Shcherbinin, A., Turko, S., Kurilin, I. (2020). Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity. In: Goutte, C., Zhu, X. (eds) Advances in Artificial Intelligence. Canadian AI 2020. Lecture Notes in Computer Science(), vol 12109. Springer, Cham. https://doi.org/10.1007/978-3-030-47358-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-47358-7_4
Published: 06 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47357-0
Online ISBN: 978-3-030-47358-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Monocular Depth by Distilling Cross-Domain Stereo Networks

Reversing the Cycle: Self-supervised Deep Stereo Through Enhanced Monocular Distillation

Dual CNN Models for Unsupervised Monocular Depth Estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Monocular Depth by Distilling Cross-Domain Stereo Networks

Reversing the Cycle: Self-supervised Deep Stereo Through Enhanced Monocular Distillation

Dual CNN Models for Unsupervised Monocular Depth Estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation