Single View Metrology in the Wild

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Abstract

Most 3D reconstruction methods may only recover scene properties up to a global scale ambiguity. We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights, through estimation of bounding box projections. We leverage categorical priors for objects such as humans or cars that commonly occur in natural images, as references for scale estimation. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion. Furthermore, the perceptual quality of our outputs is validated by a user study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Andaló, F.A., Taubin, G., Goldenstein, S.: Efficient height measurements in single images based on the detection of vanishing points. Comput. Vis. Image Underst. 138, 51–60 (2015)

    Article  Google Scholar 

  2. Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2810 (2018)

    Google Scholar 

  3. Barinova, O., Lempitsky, V., Tretiak, E., Kohli, P.: Geometric image parsing in man-made environments. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision – ECCV 2010. Lecture Notes in Computer Science, vol. 6312, pp. 57–70. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_5

    Chapter  Google Scholar 

  4. Chen, Q., Wu, H., Wada, T.: Camera calibration with two arbitrary coplanar circles. In: Pajdla, T., Matas, J. (eds.) Computer Vision – ECCV 2004. Lecture Notes in Computer Science, vol. 3023, pp. 521–532. Springer, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_41

    Chapter  Google Scholar 

  5. Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: Advances in Neural Information Processing Systems, pp. 730–738 (2016)

    Google Scholar 

  6. Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. Int. J. Comput. Vis. 40(2), 123–148 (2000)

    Article  Google Scholar 

  7. Denis, P., Elder, J.H., Estrada, F.J.: Efficient edge-based methods for estimating Manhattan frames in urban imagery. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision – ECCV 2008. Lecture Notes in Computer Science, vol. 5303, pp. 197–210. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_15

    Chapter  Google Scholar 

  8. Deutscher, J., Isard, M., MacCormick, J.: Automatic camera calibration from a single Manhattan image. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer Vision – ECCV 2002. Lecture Notes in Computer Science, vol. 2353, pp. 175–188. Springer, Berlin, Heidelberg (2002). https://doi.org/10.1007/3-540-47979-1_12

    Chapter  Google Scholar 

  9. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2366–2374 (2014)

    Google Scholar 

  10. Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7628–7637 (2019)

    Google Scholar 

  11. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  12. Gunel, S., Rhodin, H., Fua, P.: What face and body shapes can tell us about height. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0 (2019)

    Google Scholar 

  13. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision (2003)

    Google Scholar 

  14. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  15. Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. Int. J. Comput. Vis. 80(1), 3–15 (2008)

    Article  Google Scholar 

  16. Hold-Geoffroy, Y., et al.: A perceptual measure for deep single image camera calibration. In: CVPR, pp. 2354–2363 (2018)

    Google Scholar 

  17. Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 127–135 (2015)

    Google Scholar 

  18. Kim, W., Ramanagopal, M.S., Barto, C., Yu, M.Y., Rosaen, K., Goumas, N., Vasudevan, R., Johnson-Roberson, M.: PedX: benchmark dataset for metric 3-D pose estimation of pedestrians in complex urban intersections. IEEE Robot. Autom. Lett. 4(2), 1940–1947 (2019)

    Article  Google Scholar 

  19. Kluger, F., Ackermann, H., Yang, M.Y., Rosenhahn, B.: Temporally consistent horizon lines. In: 2020 International Conference on Robotics and Automation (ICRA) (2020)

    Google Scholar 

  20. Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. ACM Trans. Graph. (TOG) 26(3), 3 (2007)

    Article  Google Scholar 

  21. Lee, H., Shechtman, E., Wang, J., Lee, S.: Automatic upright adjustment of photographs with robust camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 833–844 (2013)

    Article  Google Scholar 

  22. Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4521–4530 (2019)

    Google Scholar 

  23. Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)

    Google Scholar 

  24. Lin, T.Y., et al.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  25. Man, Y., Weng, X., Li, X., Kitani, K.: GroundNet: monocular ground plane estimation with geometric consistency. In: Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  26. Martinez II, M.A.: Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. Ph.D. thesis, Princeton University (2018)

    Google Scholar 

  27. Massa, F., Girshick, R.: Maskrcnn-benchmark: fast, modular reference implementation of instance segmentation and object detection algorithms in PyTorch (2018). https://github.com/facebookresearch/maskrcnn-benchmark. Accessed 16 Oct 2019

  28. Murphy, K.P., Torralba, A., Freeman, W.T.: Graphical model for recognizing scenes and objects. In: NIPS, pp. 1499–1506 (2003)

    Google Scholar 

  29. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  30. Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11205, pp. 292–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_18

    Chapter  Google Scholar 

  31. Wang, C., Miguel Buenaposada, J., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2022–2030 (2018)

    Google Scholar 

  32. Wang, L., et al.: DeepLens: shallow depth of field from a single image. ar**v preprint ar**v:1810.08100 (2018)

  33. Workman, S., Greenwell, C., Zhai, M., Baltenberger, R., Jacobs, N.: DEEPFOCAL: a method for direct focal length estimation. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1369–1373. IEEE (2015)

    Google Scholar 

  34. Workman, S., Zhai, M., Jacobs, N.: Horizon lines in the wild. ar**v preprint ar**v:1604.02129 (2016)

  35. **ang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82. IEEE (2014)

    Google Scholar 

  36. **ao, J., Ehinger, K.A., Oliva, A., Torralba, A.: Recognizing scene viewpoint using panoramic place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2695–2702. IEEE (2012)

    Google Scholar 

  37. Zhai, M., Workman, S., Jacobs, N.: Detecting vanishing points using global image context in a non-Manhattan world. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5657–5665 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Zhu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 70663 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, R. et al. (2020). Single View Metrology in the Wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12356. Springer, Cham. https://doi.org/10.1007/978-3-030-58621-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58621-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58620-1

  • Online ISBN: 978-3-030-58621-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation