Single View Metrology in the Wild

Zhu, Rui; Yang, **ngyi; Hold-Geoffroy, Yannick; Perazzi, Federico; Eisenmann, Jonathan; Sunkavalli, Kalyan; Chandraker, Manmohan

doi:10.1007/978-3-030-58621-8_19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12356))

Included in the following conference series:

European Conference on Computer Vision

4866 Accesses
20 Citations

Abstract

Most 3D reconstruction methods may only recover scene properties up to a global scale ambiguity. We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights, through estimation of bounding box projections. We leverage categorical priors for objects such as humans or cars that commonly occur in natural images, as references for scale estimation. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion. Furthermore, the perceptual quality of our outputs is validated by a user study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

2D GANs Meet Unsupervised Single-View 3D Reconstruction

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

References

Andaló, F.A., Taubin, G., Goldenstein, S.: Efficient height measurements in single images based on the detection of vanishing points. Comput. Vis. Image Underst. 138, 51–60 (2015)
Article Google Scholar
Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2810 (2018)
Google Scholar
Barinova, O., Lempitsky, V., Tretiak, E., Kohli, P.: Geometric image parsing in man-made environments. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision – ECCV 2010. Lecture Notes in Computer Science, vol. 6312, pp. 57–70. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_5
Chapter Google Scholar
Chen, Q., Wu, H., Wada, T.: Camera calibration with two arbitrary coplanar circles. In: Pajdla, T., Matas, J. (eds.) Computer Vision – ECCV 2004. Lecture Notes in Computer Science, vol. 3023, pp. 521–532. Springer, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_41
Chapter Google Scholar
Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: Advances in Neural Information Processing Systems, pp. 730–738 (2016)
Google Scholar
Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. Int. J. Comput. Vis. 40(2), 123–148 (2000)
Article Google Scholar
Denis, P., Elder, J.H., Estrada, F.J.: Efficient edge-based methods for estimating Manhattan frames in urban imagery. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision – ECCV 2008. Lecture Notes in Computer Science, vol. 5303, pp. 197–210. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_15
Chapter Google Scholar
Deutscher, J., Isard, M., MacCormick, J.: Automatic camera calibration from a single Manhattan image. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer Vision – ECCV 2002. Lecture Notes in Computer Science, vol. 2353, pp. 175–188. Springer, Berlin, Heidelberg (2002). https://doi.org/10.1007/3-540-47979-1_12
Chapter Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2366–2374 (2014)
Google Scholar
Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7628–7637 (2019)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Gunel, S., Rhodin, H., Fua, P.: What face and body shapes can tell us about height. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision (2003)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. Int. J. Comput. Vis. 80(1), 3–15 (2008)
Article Google Scholar
Hold-Geoffroy, Y., et al.: A perceptual measure for deep single image camera calibration. In: CVPR, pp. 2354–2363 (2018)
Google Scholar
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 127–135 (2015)
Google Scholar
Kim, W., Ramanagopal, M.S., Barto, C., Yu, M.Y., Rosaen, K., Goumas, N., Vasudevan, R., Johnson-Roberson, M.: PedX: benchmark dataset for metric 3-D pose estimation of pedestrians in complex urban intersections. IEEE Robot. Autom. Lett. 4(2), 1940–1947 (2019)
Article Google Scholar
Kluger, F., Ackermann, H., Yang, M.Y., Rosenhahn, B.: Temporally consistent horizon lines. In: 2020 International Conference on Robotics and Automation (ICRA) (2020)
Google Scholar
Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. ACM Trans. Graph. (TOG) 26(3), 3 (2007)
Article Google Scholar
Lee, H., Shechtman, E., Wang, J., Lee, S.: Automatic upright adjustment of photographs with robust camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 833–844 (2013)
Article Google Scholar
Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4521–4530 (2019)
Google Scholar
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)
Google Scholar
Lin, T.Y., et al.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Man, Y., Weng, X., Li, X., Kitani, K.: GroundNet: monocular ground plane estimation with geometric consistency. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Martinez II, M.A.: Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. Ph.D. thesis, Princeton University (2018)
Google Scholar
Massa, F., Girshick, R.: Maskrcnn-benchmark: fast, modular reference implementation of instance segmentation and object detection algorithms in PyTorch (2018). https://github.com/facebookresearch/maskrcnn-benchmark. Accessed 16 Oct 2019
Murphy, K.P., Torralba, A., Freeman, W.T.: Graphical model for recognizing scenes and objects. In: NIPS, pp. 1499–1506 (2003)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11205, pp. 292–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_18
Chapter Google Scholar
Wang, C., Miguel Buenaposada, J., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2022–2030 (2018)
Google Scholar
Wang, L., et al.: DeepLens: shallow depth of field from a single image. ar**v preprint ar**v:1810.08100 (2018)
Workman, S., Greenwell, C., Zhai, M., Baltenberger, R., Jacobs, N.: DEEPFOCAL: a method for direct focal length estimation. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1369–1373. IEEE (2015)
Google Scholar
Workman, S., Zhai, M., Jacobs, N.: Horizon lines in the wild. ar**v preprint ar**v:1604.02129 (2016)
**ang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82. IEEE (2014)
Google Scholar
**ao, J., Ehinger, K.A., Oliva, A., Torralba, A.: Recognizing scene viewpoint using panoramic place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2695–2702. IEEE (2012)
Google Scholar
Zhai, M., Workman, S., Jacobs, N.: Detecting vanishing points using global image context in a non-Manhattan world. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5657–5665 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California San Diego, La Jolla, CA, 92093, USA
Rui Zhu, **ngyi Yang & Manmohan Chandraker
Adobe Research, San Jose, CA, 95110, USA
Yannick Hold-Geoffroy, Federico Perazzi, Jonathan Eisenmann & Kalyan Sunkavalli

Authors

Rui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
**ngyi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Hold-Geoffroy
View author publications
You can also search for this author in PubMed Google Scholar
Federico Perazzi
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Eisenmann
View author publications
You can also search for this author in PubMed Google Scholar
Kalyan Sunkavalli
View author publications
You can also search for this author in PubMed Google Scholar
Manmohan Chandraker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Zhu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 70663 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, R. et al. (2020). Single View Metrology in the Wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12356. Springer, Cham. https://doi.org/10.1007/978-3-030-58621-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-58621-8_19
Published: 27 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58620-1
Online ISBN: 978-3-030-58621-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Single View Metrology in the Wild

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

2D GANs Meet Unsupervised Single-View 3D Reconstruction

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 70663 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Single View Metrology in the Wild

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

2D GANs Meet Unsupervised Single-View 3D Reconstruction

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 70663 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation