Abstract
Inferring depth from a single image is a difficult task in computer vision, which needs to utilize adequate monocular cues contained in the image. Inspired by Saxena et al’s work, this paper presents a close-form iterative algorithm to process multi-scale image segmentation and depth inferring alternately, which can significantly improve segmentation and depth estimate results. First, an EM-based algorithm is applied to obtain an initial multi-scale image segmentation result. Then, the multi-scale Markov random field (MRF) model, trained by supervised learning, is used to infer both depths and the relations between depths at different image regions. Next, a graph-based region merging algorithm is applied to merge the segmentations at the larger scales by incorporating the inferred depths. At the last, the refined multi-scale image segmentations are used as input of MRF model and the depth are re-inferred. The above processes are iteratively continued until the expected results are achieved. Since there are no changes on the segmentations at the finest scale in the iterative process, it still can capture the detailed 3D structure. Meanwhile, the refined segmentations at the other scales will help obtain more global structure information in the image. The contrastive experimental results verify the validity of our method that it can infer quantitatively better depth estimations for 62.7% of 134 images downloaded from the Saxena’s database. Our method can also improve the image segmentation results in the sense of scene interpretation. Moreover, the paper extends the method to estimate the depth of the scene with fore-objects.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Neural Information Processing System (NIPS), vol. 18 (2005)
Saxena, A., Chung, S.H., Ng, A.Y.: 3-d depth reconstruction from a single still image. International Journal of Computer Vision (IJCV) 76, 53–69 (2007)
Saxena, A., Schulte, J., Ng, A.Y.: Depth estimation using monocular and stereo cues. In: International Joint Conference on Artificial Intelligence (IJCAI) (2007)
Saxena, A., Sun, M., Ng, A.Y.: Make3d: depth perception from a single still image. In: AAAI Conference on Artificial Intelligence (AAAI) (2008)
Saxena, A., Sun, M., Ng, A.Y.: Make3d: Learning 3-d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 31, 824–840 (2008)
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. In: International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH) (2005)
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: International Conference on Computer Vision (ICCV) (2005)
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: International Conference on Computer Vision (ICCV) (2009)
Hoiem, D., Efros, A.A., Hebert, M.: Closing the loop on scene interpretation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. International Journal of Computer Vision (IJCV) 80, 3–15 (2008)
Malik, J., Rosenholtz, R.: Computing local surface orientation and shape from texture for curved surfaces. International Journal of Computer Vision (IJCV) 23, 149–168 (1997)
Malik, J., Perona, P.: Preattentive texture discrimination with early vision mechanisms. Journal of the Optical Society of America A 7, 923–932 (1990)
Maki, A., Watanabe, M., Wiles, C.: Geotensity: Combining motion and lighting for 3d surface reconstruction. International Journal of Computer Vision (IJCV) 48, 75–90 (2002)
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape from shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 21, 690–706 (1999)
Horry, Y., Anjyo, K.I., Arai, K.: Tour into the picture: using a spidery mesh interface to makeanimation from a single image. In: International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH) (1997)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision (IJCV) 47, 7–42 (2002)
Forsyth, D., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference (2002)
Delage, E., Lee, H., Ng, A.: A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Depth from familiar objects: A hierarchical model for 3d scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: International Conference on Machine Learning (ICML) (2005)
Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: Combining models for holistic scene understanding. In: Neural Information Processing Systems (NIPS) (2008)
Carson, C., Belongie, S., Greenspan, H., Malik, J.: Blobworld: image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 24, 1026–1038 (2002)
Garding, J., Lindeberg, T.: Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision (IJCV) 17, 163–191 (1996)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. International Journal of Computer Vision (IJCV) 59, 167–181 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cao, Y., **a, Y., Wang, Z. (2010). A Close-Form Iterative Algorithm for Depth Inferring from a Single Image. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15555-0_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-15555-0_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15554-3
Online ISBN: 978-3-642-15555-0
eBook Packages: Computer ScienceComputer Science (R0)