Abstract
Most 3D reconstruction methods may only recover scene properties up to a global scale ambiguity. We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights, through estimation of bounding box projections. We leverage categorical priors for objects such as humans or cars that commonly occur in natural images, as references for scale estimation. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion. Furthermore, the perceptual quality of our outputs is validated by a user study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andaló, F.A., Taubin, G., Goldenstein, S.: Efficient height measurements in single images based on the detection of vanishing points. Comput. Vis. Image Underst. 138, 51–60 (2015)
Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2810 (2018)
Barinova, O., Lempitsky, V., Tretiak, E., Kohli, P.: Geometric image parsing in man-made environments. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision – ECCV 2010. Lecture Notes in Computer Science, vol. 6312, pp. 57–70. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_5
Chen, Q., Wu, H., Wada, T.: Camera calibration with two arbitrary coplanar circles. In: Pajdla, T., Matas, J. (eds.) Computer Vision – ECCV 2004. Lecture Notes in Computer Science, vol. 3023, pp. 521–532. Springer, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_41
Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: Advances in Neural Information Processing Systems, pp. 730–738 (2016)
Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. Int. J. Comput. Vis. 40(2), 123–148 (2000)
Denis, P., Elder, J.H., Estrada, F.J.: Efficient edge-based methods for estimating Manhattan frames in urban imagery. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision – ECCV 2008. Lecture Notes in Computer Science, vol. 5303, pp. 197–210. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_15
Deutscher, J., Isard, M., MacCormick, J.: Automatic camera calibration from a single Manhattan image. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer Vision – ECCV 2002. Lecture Notes in Computer Science, vol. 2353, pp. 175–188. Springer, Berlin, Heidelberg (2002). https://doi.org/10.1007/3-540-47979-1_12
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2366–2374 (2014)
Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7628–7637 (2019)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Gunel, S., Rhodin, H., Fua, P.: What face and body shapes can tell us about height. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision (2003)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. Int. J. Comput. Vis. 80(1), 3–15 (2008)
Hold-Geoffroy, Y., et al.: A perceptual measure for deep single image camera calibration. In: CVPR, pp. 2354–2363 (2018)
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 127–135 (2015)
Kim, W., Ramanagopal, M.S., Barto, C., Yu, M.Y., Rosaen, K., Goumas, N., Vasudevan, R., Johnson-Roberson, M.: PedX: benchmark dataset for metric 3-D pose estimation of pedestrians in complex urban intersections. IEEE Robot. Autom. Lett. 4(2), 1940–1947 (2019)
Kluger, F., Ackermann, H., Yang, M.Y., Rosenhahn, B.: Temporally consistent horizon lines. In: 2020 International Conference on Robotics and Automation (ICRA) (2020)
Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. ACM Trans. Graph. (TOG) 26(3), 3 (2007)
Lee, H., Shechtman, E., Wang, J., Lee, S.: Automatic upright adjustment of photographs with robust camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 833–844 (2013)
Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4521–4530 (2019)
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)
Lin, T.Y., et al.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Man, Y., Weng, X., Li, X., Kitani, K.: GroundNet: monocular ground plane estimation with geometric consistency. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Martinez II, M.A.: Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. Ph.D. thesis, Princeton University (2018)
Massa, F., Girshick, R.: Maskrcnn-benchmark: fast, modular reference implementation of instance segmentation and object detection algorithms in PyTorch (2018). https://github.com/facebookresearch/maskrcnn-benchmark. Accessed 16 Oct 2019
Murphy, K.P., Torralba, A., Freeman, W.T.: Graphical model for recognizing scenes and objects. In: NIPS, pp. 1499–1506 (2003)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11205, pp. 292–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_18
Wang, C., Miguel Buenaposada, J., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2022–2030 (2018)
Wang, L., et al.: DeepLens: shallow depth of field from a single image. ar**v preprint ar**v:1810.08100 (2018)
Workman, S., Greenwell, C., Zhai, M., Baltenberger, R., Jacobs, N.: DEEPFOCAL: a method for direct focal length estimation. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1369–1373. IEEE (2015)
Workman, S., Zhai, M., Jacobs, N.: Horizon lines in the wild. ar**v preprint ar**v:1604.02129 (2016)
**ang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82. IEEE (2014)
**ao, J., Ehinger, K.A., Oliva, A., Torralba, A.: Recognizing scene viewpoint using panoramic place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2695–2702. IEEE (2012)
Zhai, M., Workman, S., Jacobs, N.: Detecting vanishing points using global image context in a non-Manhattan world. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5657–5665 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, R. et al. (2020). Single View Metrology in the Wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12356. Springer, Cham. https://doi.org/10.1007/978-3-030-58621-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-58621-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58620-1
Online ISBN: 978-3-030-58621-8
eBook Packages: Computer ScienceComputer Science (R0)