Log in

RDNeRF: relative depth guided NeRF for dense free view synthesis

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In this paper, we focus on dense view synthesis with free movements in indoor scenes for better user interactions than sparse views. Neural radiance field (NeRF) handles sparsely and spherically captured scenes well, while it struggles in scenes with dense free views. We extend NeRF to handle these views of indoor scenes. We present a learning-based approach named relative depth guided NeRF (RDNeRF), which jointly renders RGB images and recovers scene geometry in dense free views. To recover the geometry of each view without the ground-truth depth, we propose to directly learn the relative depth by implicit functions and transform it as a geometric volume bound for geometry-aware sampling and integration of NeRF. With correct scene geometry, we further model the implicit internal relevance of inputs to enhance the representation ability of NeRF in dense free views. We conduct extensive experiments in indoor scenes for dense free view synthesis. RDNeRF outperforms current state-of-the-art methods and achieves 24.95 PSNR score and 0.77 SSIM score. Besides, it recovers more accurate geometry than basic models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

We used two common datasets in this work: 7-Scenes [16https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes and ScanNet [8http://www.scan-net.org/.

References

  1. Aliev, KA., Ulyanov, D., Lempitsky, V.: Neural point-based graphics 2(3):4 (2019). ar**v preprint ar**v:1906.08240

  2. Andraghetti, L., Myriokefalitakis, P., Dovesi, P.L., et al.: Enhancing self-supervised monocular depth estimation with traditional visual odometry. In: 2019 International Conference on 3D Vision (3DV), pp. 424–433. IEEE (2019)

  3. Battiato, S., Curti, S., La Cascia, M., et al.: Depth map generation by image classification. In: Three-Dimensional Image Capture and Applications VI, International Society for Optics and Photonics, pp. 95–104 (2004)

  4. Chan, S., Shum, H.Y., Ng, K.T.: Image-based rendering and synthesis. IEEE Signal Process. Mag. 24(6), 22–33 (2007)

    Article  ADS  Google Scholar 

  5. Chen, D., Sang, X., Wang, P., et al.: Dense-view synthesis for three-dimensional light-field display based on unsupervised learning. Opt. Express 27(17), 24,624–24,641 (2019)

  6. Chen, W., Fu, Z., Yang, D., et al.: Single-image depth perception in the wild. ar**v preprint ar**v:1604.03901 (2016)

  7. Chen, Z., Wang, C., Guo, Y.C, et al.: Structnerf: Neural radiance fields for indoor scenes with structural hints. ar**v preprint ar**v:2209.05277 (2022)

  8. Dai, A., Chang, A.X., Savva, M., et al.: Scannet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

  9. Dai, P., Zhang, Y., Li, Z., et al.: Neural point cloud rendering via multi-plane projection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7830–7839 (2020)

  10. Deng, K., Liu, A., Zhu, J.Y., et al.: Depth-supervised nerf: Fewer views and faster training for free. ar**v preprint ar**v:2107.02791 (2021)

  11. DeVries, T., Bautista, M.A., Srivastava, N., et al .: Unconstrained scene generation with locally conditioned radiance fields. ar**v preprint ar**v:2104.00670 (2021)

  12. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al .: An image is worth 16x16 words: Transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)

  13. Eigen, D., Fergus, R .: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)

  14. Flynn, J., Broxton, M., Debevec, P., et al.: Deepview: View synthesis with learned gradient descent. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2367–2376 (2019)

  15. Fu, H., Gong, M., Wang, C., et al .: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)

  16. Glocker, B., Izadi, S., Shotton, J., et al.: Real-time RGB-D camera relocalization. In: 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 173–179. IEEE (2013)

  17. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)

  18. Gordon, A., Li, H., Jonschkowski, R., et al.: Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8977–8986 (2019)

  19. Hedman, P., Philip, J., Price, T., et al.: Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. (TOG) 37(6), 1–15 (2018)

    Article  Google Scholar 

  20. Kajiya, J.T., Von Herzen, B.P.: Ray tracing volume densities. ACM SIGGRAPH Comput. Graph. 18(3), 165–174 (1984)

    Article  Google Scholar 

  21. Laina, I., Rupprecht, C., Belagiannis, V., et al.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)

  22. Lindell, D.B., Martel, J.N., Wetzstein, G.: Autoint: Automatic integration for fast neural volume rendering. ar**v preprint ar**v:2012.01714 (2020)

  23. Liu, F., Shen, C., Lin, G., et al.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2015)

    Article  PubMed  Google Scholar 

  24. Liu, L., Gu, J., Lin, K.Z., et al.: Neural sparse voxel fields. ar**v preprint ar**v:2007.11571 (2020)

  25. Martin-Brualla, R., Radwan, N., Sajjadi, M.S., et al .: Nerf in the wild: Neural radiance fields for unconstrained photo collections. ar**v preprint ar**v:2008.02268 (2020)

  26. Max, N.: Optical models for direct volume rendering. IEEE Trans. Visual Comput. Graph. 1(2), 99–108 (1995)

    Article  Google Scholar 

  27. Mildenhall, B., Srinivasan, P.P., Tancik, M., et al.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: European Conference on Computer Vision, pp. 405–421. Springer (2020)

  28. Neff, T., Stadlbauer, P., Parger, M., et al.: Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks. In: Computer Graphics Forum, Wiley Online Library, pp. 45–59 (2021)

  29. Nguyen, H.T., Do, M.N.: Error analysis for image-based rendering with depth information. IEEE Trans. Image Process. 18(4), 703–716 (2009)

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  30. Penner, E., Zhang, L.: Soft 3d reconstruction for view synthesis. ACM Trans. Graph. (TOG) 36(6), 1–11 (2017)

    Article  Google Scholar 

  31. Pumarola, A., Corona, E., Pons-Moll, G., et al.: D-nerf: Neural radiance fields for dynamic scenes. ar**v preprint ar**v:2011.13961 (2020)

  32. Qi, X., Liao, R., Liu, Z., et al.: Geonet: Geometric neural network for joint depth and surface normal estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291 (2018)

  33. Ranftl, R., Lasinger, K., Hafner, D., et al.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2020)

  34. Reizenstein, J., Shapovalov, R., Henzler, P., et al.: Common objects in 3D: Large-scale learning and evaluation of real-life 3D category reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10,901–10,911 (2021)

  35. Riegler, G., Koltun, V.: Free view synthesis. In: European Conference on Computer Vision (2020)

  36. Riegler, G., Koltun, V.: Stable view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2021)

  37. Roessle, B., Barron, J.T., Mildenhall, B., et al.: Dense depth priors for neural radiance fields from sparse input views. ar**v preprint ar**v:2112.03288 (2021)

  38. Schonberger, J.L., Frahm, J.M .: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

  39. Srinivasan, P.P., Tucker, R., Barron , J.T., et al.: Pushing the boundaries of view extrapolation with multiplane images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–184 (2019)

  40. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. ar**v preprint ar**v:1706.03762 (2017)

  41. Wang, C., Lucey, S., Perazzi, F., et al.: Web stereo video supervision for depth prediction from dynamic scenes. In: 2019 International Conference on 3D Vision (3DV), pp. 348–357. IEEE (2019)

  42. Wang, P., Chen, X., Chen, T., et al.: Is attention all nerf needs? ar**v preprint ar**v:2207.13298 (2022)

  43. Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  PubMed  ADS  Google Scholar 

  44. Wei, Y., Liu, S., Rao, Y., et al.: Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5610–5619 (2021)

  45. Wu, X., Xu, J., Zhu, Z., et al.: Scalable neural indoor scene rendering. ACM Trans. Graph. (TOG) 41(4), 1–16 (2022)

    Article  CAS  Google Scholar 

  46. Yin, W., Zhang, J., Wang, O., et al.: Learning to recover 3d scene shape from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 204–213 (2021)

  47. Yu, A., Ye, V., Tancik, M., et al.: pixelnerf: Neural radiance fields from one or few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4578–4587 (2021)

  48. Zhang, C., Chen, T.: A survey on image-based rendering-representation, sampling and compression. Signal Process. Image Commun. 19(1), 1–28 (2004)

    Article  Google Scholar 

  49. Zhang, K., Riegler, G., Snavely, N., et al.: Nerf++: Analyzing and improving neural radiance fields. ar**v preprint ar**v:2010.07492 (2020)

  50. Zhang, R., Isola, P., Efros, A.A., et al.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

  51. Zhou, T., Brown, M., Snavely, N., et al.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)

  52. Zhou, T., Tucker, R., Flynn, J., et al.: Stereo magnification: Learning view synthesis using multiplane images. ar**v preprint ar**v:1805.09817 (2018)

Download references

Funding

This work is supported by the National Key Research and Development Program of China Grant (No. 2018AAA0100400), NSFC (No.61922046) and NSFC (No. 62132012).

Author information

Authors and Affiliations

Authors

Contributions

JQ and YZ were involved in conceiving, designing the analysis and writing; P-TJ and BR contributed to writing—review and editing; M-MC assisted in the supervision.

Corresponding author

Correspondence to Bo Ren.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiu, J., Zhu, Y., Jiang, PT. et al. RDNeRF: relative depth guided NeRF for dense free view synthesis. Vis Comput 40, 1485–1497 (2024). https://doi.org/10.1007/s00371-023-02863-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02863-5

Keywords

Navigation