RDNeRF: relative depth guided NeRF for dense free view synthesis

Qiu, Jiaxiong; Zhu, Yifan; Jiang, Peng-Tao; Cheng, Ming-Ming; Ren, Bo

doi:10.1007/s00371-023-02863-5

RDNeRF: relative depth guided NeRF for dense free view synthesis

Original article
Published: 05 May 2023

Volume 40, pages 1485–1497, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jiaxiong Qiu¹^na1,
Yifan Zhu¹^na1,
Peng-Tao Jiang²,
Ming-Ming Cheng¹ &
…
Bo Ren ORCID: orcid.org/0000-0001-8179-9122¹

706 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we focus on dense view synthesis with free movements in indoor scenes for better user interactions than sparse views. Neural radiance field (NeRF) handles sparsely and spherically captured scenes well, while it struggles in scenes with dense free views. We extend NeRF to handle these views of indoor scenes. We present a learning-based approach named relative depth guided NeRF (RDNeRF), which jointly renders RGB images and recovers scene geometry in dense free views. To recover the geometry of each view without the ground-truth depth, we propose to directly learn the relative depth by implicit functions and transform it as a geometric volume bound for geometry-aware sampling and integration of NeRF. With correct scene geometry, we further model the implicit internal relevance of inputs to enhance the representation ability of NeRF in dense free views. We conduct extensive experiments in indoor scenes for dense free view synthesis. RDNeRF outperforms current state-of-the-art methods and achieves 24.95 PSNR score and 0.77 SSIM score. Besides, it recovers more accurate geometry than basic models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 10

CDNeRF: A Multi-modal Feature Guided Neural Radiance Fields

Fast Generalizable Novel View Synthesis with Uncertainty-Aware Sampling

GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints

Data availability

We used two common datasets in this work: 7-Scenes [16] https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes and ScanNet [8] http://www.scan-net.org/.

References

Aliev, KA., Ulyanov, D., Lempitsky, V.: Neural point-based graphics 2(3):4 (2019). ar**v preprint ar**v:1906.08240
Andraghetti, L., Myriokefalitakis, P., Dovesi, P.L., et al.: Enhancing self-supervised monocular depth estimation with traditional visual odometry. In: 2019 International Conference on 3D Vision (3DV), pp. 424–433. IEEE (2019)
Battiato, S., Curti, S., La Cascia, M., et al.: Depth map generation by image classification. In: Three-Dimensional Image Capture and Applications VI, International Society for Optics and Photonics, pp. 95–104 (2004)
Chan, S., Shum, H.Y., Ng, K.T.: Image-based rendering and synthesis. IEEE Signal Process. Mag. 24(6), 22–33 (2007)
Article ADS Google Scholar
Chen, D., Sang, X., Wang, P., et al.: Dense-view synthesis for three-dimensional light-field display based on unsupervised learning. Opt. Express 27(17), 24,624–24,641 (2019)
Chen, W., Fu, Z., Yang, D., et al.: Single-image depth perception in the wild. ar**v preprint ar**v:1604.03901 (2016)
Chen, Z., Wang, C., Guo, Y.C, et al.: Structnerf: Neural radiance fields for indoor scenes with structural hints. ar**v preprint ar**v:2209.05277 (2022)
Dai, A., Chang, A.X., Savva, M., et al.: Scannet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Dai, P., Zhang, Y., Li, Z., et al.: Neural point cloud rendering via multi-plane projection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7830–7839 (2020)
Deng, K., Liu, A., Zhu, J.Y., et al.: Depth-supervised nerf: Fewer views and faster training for free. ar**v preprint ar**v:2107.02791 (2021)
DeVries, T., Bautista, M.A., Srivastava, N., et al .: Unconstrained scene generation with locally conditioned radiance fields. ar**v preprint ar**v:2104.00670 (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al .: An image is worth 16x16 words: Transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)
Eigen, D., Fergus, R .: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Flynn, J., Broxton, M., Debevec, P., et al.: Deepview: View synthesis with learned gradient descent. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2367–2376 (2019)
Fu, H., Gong, M., Wang, C., et al .: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)
Glocker, B., Izadi, S., Shotton, J., et al.: Real-time RGB-D camera relocalization. In: 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 173–179. IEEE (2013)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)
Gordon, A., Li, H., Jonschkowski, R., et al.: Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8977–8986 (2019)
Hedman, P., Philip, J., Price, T., et al.: Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. (TOG) 37(6), 1–15 (2018)
Article Google Scholar
Kajiya, J.T., Von Herzen, B.P.: Ray tracing volume densities. ACM SIGGRAPH Comput. Graph. 18(3), 165–174 (1984)
Article Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., et al.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)
Lindell, D.B., Martel, J.N., Wetzstein, G.: Autoint: Automatic integration for fast neural volume rendering. ar**v preprint ar**v:2012.01714 (2020)
Liu, F., Shen, C., Lin, G., et al.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2015)
Article PubMed Google Scholar
Liu, L., Gu, J., Lin, K.Z., et al.: Neural sparse voxel fields. ar**v preprint ar**v:2007.11571 (2020)
Martin-Brualla, R., Radwan, N., Sajjadi, M.S., et al .: Nerf in the wild: Neural radiance fields for unconstrained photo collections. ar**v preprint ar**v:2008.02268 (2020)
Max, N.: Optical models for direct volume rendering. IEEE Trans. Visual Comput. Graph. 1(2), 99–108 (1995)
Article Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., et al.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: European Conference on Computer Vision, pp. 405–421. Springer (2020)
Neff, T., Stadlbauer, P., Parger, M., et al.: Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks. In: Computer Graphics Forum, Wiley Online Library, pp. 45–59 (2021)
Nguyen, H.T., Do, M.N.: Error analysis for image-based rendering with depth information. IEEE Trans. Image Process. 18(4), 703–716 (2009)
Article MathSciNet PubMed ADS Google Scholar
Penner, E., Zhang, L.: Soft 3d reconstruction for view synthesis. ACM Trans. Graph. (TOG) 36(6), 1–11 (2017)
Article Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., et al.: D-nerf: Neural radiance fields for dynamic scenes. ar**v preprint ar**v:2011.13961 (2020)
Qi, X., Liao, R., Liu, Z., et al.: Geonet: Geometric neural network for joint depth and surface normal estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291 (2018)
Ranftl, R., Lasinger, K., Hafner, D., et al.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2020)
Reizenstein, J., Shapovalov, R., Henzler, P., et al.: Common objects in 3D: Large-scale learning and evaluation of real-life 3D category reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10,901–10,911 (2021)
Riegler, G., Koltun, V.: Free view synthesis. In: European Conference on Computer Vision (2020)
Riegler, G., Koltun, V.: Stable view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2021)
Roessle, B., Barron, J.T., Mildenhall, B., et al.: Dense depth priors for neural radiance fields from sparse input views. ar**v preprint ar**v:2112.03288 (2021)
Schonberger, J.L., Frahm, J.M .: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Srinivasan, P.P., Tucker, R., Barron , J.T., et al.: Pushing the boundaries of view extrapolation with multiplane images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–184 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. ar**v preprint ar**v:1706.03762 (2017)
Wang, C., Lucey, S., Perazzi, F., et al.: Web stereo video supervision for depth prediction from dynamic scenes. In: 2019 International Conference on 3D Vision (3DV), pp. 348–357. IEEE (2019)
Wang, P., Chen, X., Chen, T., et al.: Is attention all nerf needs? ar**v preprint ar**v:2207.13298 (2022)
Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article PubMed ADS Google Scholar
Wei, Y., Liu, S., Rao, Y., et al.: Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5610–5619 (2021)
Wu, X., Xu, J., Zhu, Z., et al.: Scalable neural indoor scene rendering. ACM Trans. Graph. (TOG) 41(4), 1–16 (2022)
Article CAS Google Scholar
Yin, W., Zhang, J., Wang, O., et al.: Learning to recover 3d scene shape from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 204–213 (2021)
Yu, A., Ye, V., Tancik, M., et al.: pixelnerf: Neural radiance fields from one or few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4578–4587 (2021)
Zhang, C., Chen, T.: A survey on image-based rendering-representation, sampling and compression. Signal Process. Image Commun. 19(1), 1–28 (2004)
Article Google Scholar
Zhang, K., Riegler, G., Snavely, N., et al.: Nerf++: Analyzing and improving neural radiance fields. ar**v preprint ar**v:2010.07492 (2020)
Zhang, R., Isola, P., Efros, A.A., et al.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Zhou, T., Brown, M., Snavely, N., et al.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
Zhou, T., Tucker, R., Flynn, J., et al.: Stereo magnification: Learning view synthesis using multiplane images. ar**v preprint ar**v:1805.09817 (2018)

Download references

Funding

This work is supported by the National Key Research and Development Program of China Grant (No. 2018AAA0100400), NSFC (No.61922046) and NSFC (No. 62132012).

Author information

Jiaxiong Qiu and Yifan Zhu have contributed equally to this work.

Authors and Affiliations

VCIP, College of Computer Science, Nankai University, Tian**, China
Jiaxiong Qiu, Yifan Zhu, Ming-Ming Cheng & Bo Ren
Zhejiang University, Hangzhou, China
Peng-Tao Jiang

Authors

Jiaxiong Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Peng-Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Ming Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Bo Ren
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JQ and YZ were involved in conceiving, designing the analysis and writing; P-TJ and BR contributed to writing—review and editing; M-MC assisted in the supervision.

Corresponding author

Correspondence to Bo Ren.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qiu, J., Zhu, Y., Jiang, PT. et al. RDNeRF: relative depth guided NeRF for dense free view synthesis. Vis Comput 40, 1485–1497 (2024). https://doi.org/10.1007/s00371-023-02863-5

Download citation

Accepted: 27 March 2023
Published: 05 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02863-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RDNeRF: relative depth guided NeRF for dense free view synthesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CDNeRF: A Multi-modal Feature Guided Neural Radiance Fields

Fast Generalizable Novel View Synthesis with Uncertainty-Aware Sampling

GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

RDNeRF: relative depth guided NeRF for dense free view synthesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CDNeRF: A Multi-modal Feature Guided Neural Radiance Fields

Fast Generalizable Novel View Synthesis with Uncertainty-Aware Sampling

GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation