Log in

Deep learning-based 3D reconstruction: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript


Image-based 3D reconstruction is a long-established, ill-posed problem defined within the scope of computer vision and graphics. The purpose of image-based 3D reconstruction is to retrieve the 3D structure and geometry of a target object or scene from a set of input images. This task has a wide range of applications in various fields, such as robotics, virtual reality, and medical imaging. In recent years, learning-based methods for 3D reconstruction have attracted many researchers worldwide. These novel methods can implicitly estimate the 3D shape of an object or a scene in an end-to-end manner, eliminating the need for develo** multiple stages such as key-point detection and matching. Furthermore, these novel methods can reconstruct the shapes of objects from a single input image. Due to rapid advancements in this field, as well as the multitude of opportunities to improve the performance of 3D reconstruction methods, a thorough review of algorithms in this area seems necessary. As a result, this research provides a complete overview of recent developments in the field of image-based 3D reconstruction. The studied methods are examined from several viewpoints, such as input types, model structures, output representations, and training strategies. A detailed comparison is also provided for the reader. Finally, unresolved challenges, underlying issues, and possible future work are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others


  • Aanæs H, Jensen RR, Vogiatzis G et al (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis 120(2):153–168

    Article  MathSciNet  Google Scholar 

  • Barnes C, Shechtman E, Finkelstein A et al (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph 28(3):24

    Article  Google Scholar 

  • Bhoi A (2019) Monocular depth estimation: a survey. ar**v preprint. ar**v:1901.09402

  • Bronstein MM, Bruna J, LeCun Y et al (2017) Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag 34(4):18–42. https://doi.org/10.1109/msp.2017.2693418

    Article  Google Scholar 

  • Cai S, Obukhov A, Dai D et al (2022) Pix2nerf: unsupervised conditional p-gan for single image to neural radiance fields translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3981–3990

  • Chang AX, Funkhouser T, Guibas L et al (2015) Shapenet: an information-rich 3D model repository. ar**v preprint. ar**v:1512.03012

  • Chen RT, Rubanova Y, Bettencourt J et al (2018) Neural ordinary differential equations. ar**v preprint. ar**v:1806.07366

  • Chen Z, Zhang H (2019) Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5939–5948, https://doi.org/10.1109/cvpr.2019.00609

  • Chen Z, Gholami A, Nießner M et al (2021) Scan2cap: context-aware dense captioning in rgb-d scans. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3193–3203. https://doi.org/10.1109/CVPR46437.2021.00321

  • Choy C, Gwak J, Savarese S (2019) 4D spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3075–3084. https://doi.org/10.1109/cvpr.2019.00319

  • Choy CB, Xu D, Gwak J et al (2016) 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: European conference on computer vision, Springer, Cham, pp 628–644. https://doi.org/10.1007/978-3-319-46484-8_38

  • Collins RT (1996) A space-sweep approach to true multi-image matching. In: Proceedings CVPR IEEE Computer Society conference on computer vision and pattern recognition. IEEE, pp 358–363

  • Crawshaw M (2020) Multi-task learning with deep neural networks: a survey. ar**v preprint. ar**v:2009.09796

  • Dai A, Chang AX, Savva M et al (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5828–5839, https://doi.org/10.1109/cvpr.2017.261

  • De Vries H, Strub F, Mary J et al (2017) Modulating early visual processing by language. ar**v preprint. ar**v:1707.00683

  • Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29:3844–3852. https://doi.org/10.5555/3157382.3157527

    Article  Google Scholar 

  • Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16×16 words: transformers for image recognition at scale. ar**v preprint. ar**v:2010.11929

  • Du Y, Zhang Y, Yu HX et al (2021) Neural radiance flow for 4D view synthesis and video processing. In: 2021 IEEE/CVF international conference on computer vision (ICCV). IEEE Computer Society, pp 14304–14314

  • Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. ar**v preprint. ar**v:1406.2283

  • Eldar Y, Lindenbaum M, Porat M et al (1997) The farthest point strategy for progressive image sampling. IEEE Trans Image Process 6(9):1305–1315. https://doi.org/10.1109/83.623193

    Article  Google Scholar 

  • Engelmann F, Rematas K, Leibe B et al (2021) From points to multi-object 3d reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4588–4597. https://doi.org/10.1109/CVPR46437.2021.00456

  • Fahim G, Amin K, Zarif S (2021) Single-view 3d reconstruction: a survey of deep learning methods. Comput Graph 94:164–190. https://doi.org/10.1016/j.cag.2020.12.004

  • Fan H, Su H, Guibas LJ (2017) A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 605–613. https://doi.org/10.1109/cvpr.2017.264

  • Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, PMLR, pp 1126–1135

  • Fu K, Peng J, He Q et al (2021) Single image 3d object reconstruction based on deep learning: a review. Multimedia Tools Appl 80(1):463–498

    Article  Google Scholar 

  • Furukawa Y, Hernández C et al (2015) Multi-view stereo: a tutorial. Found Trends Comput Graph Vis 9(1–2):1–148

  • Gao Z, Li E, Yang G et al (2019) Object reconstruction with deep learning: a survey. In: 2019 IEEE 9th annual international conference on CYBER technology in automation, control, and intelligent systems (CYBER). IEEE, pp 643–648. https://doi.org/10.1109/CYBER46603.2019.9066595

  • Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361. https://doi.org/10.1109/cvpr.2012.6248074

  • Gkioxari G, Malik J, Johnson J (2019) Mesh R-CNN. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9785–9795. https://doi.org/10.1109/iccv.2019.00988

  • Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 270–279, https://doi.org/10.1109/cvpr.2017.699

  • Goodfellow I, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:139–144

  • Gu X, Fan Z, Zhu S et al (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2495–2504

  • Gupta K, Chandraker M (2020) Neural mesh flow: 3D manifold mesh generation via diffeomorphic flows. Adv Neural Inf Process Syst 33:1–11

  • Han XF, Laga H, Bennamoun M (2019) Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era. IEEE Trans Pattern Anal Mach Intell 43(5):1578–1604. https://doi.org/10.1109/tpami.2019.2954885

    Article  Google Scholar 

  • He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/cvpr.2016.90

  • He T, Collomosse J, ** H et al (2020) Geo-PIFu: geometry and pixel aligned implicit functions for single-view human reconstruction. Adv Neural Inf Process Syst 33:9276–9287

    Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  • Huang PH, Matzen K, Kopf J et al (2018) DeepMVS: learning multi-view stereopsis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2821–2830. https://doi.org/10.1109/cvpr.2018.00298

  • Huang T, Zou H, Cui J et al (2021) RFNet: recurrent forward network for dense point cloud completion. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 12508–12517

  • Huang Z, Yu Y, Xu J et al (2020) PF-Net: point fractal network for 3D point cloud completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7662–7670. https://doi.org/10.1109/cvpr42600.2020.00768

  • Jensen R, Dahl A, Vogiatzis G et al (2014) Large scale multi-view stereopsis evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 406–413

  • Ji M, Gall J, Zheng H et al (2017) SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. In: Proceedings of the IEEE international conference on computer vision, pp 2307–2315

  • Kingma DP, Welling M (2013) Auto-encoding variational bayes. ar**v preprint. ar**v:1312.6114

  • Knapitsch A, Park J, Zhou QY et al (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph (ToG) 36(4):1–13

    Article  Google Scholar 

  • Koch S, Matveev A, Jiang Z et al (2019) ABC: a big cad model dataset for geometric deep learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9601–9611. https://doi.org/10.1109/CVPR.2019.00983

  • Kundu A, Li Y, Rehg JM (2018) 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3559–3568. https://doi.org/10.1109/cvpr.2018.00375

  • L Navaneet K, Mandikal P, Jampani V et al (2019) Differ: Moving beyond 3d reconstruction with differentiable feature rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 18–24

  • Laga H, Jospin LV, Boussaid F et al (2020) A survey on deep learning techniques for stereo-based depth estimation. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2020.3032602

  • Lin CH, Kong C, Lucey S (2018) Learning efficient point cloud generation for dense 3D object reconstruction. In: Proceedings of the AAAI conference on artificial intelligence

  • Liu L, Gu J, Zaw Lin K et al (2020) Neural sparse voxel fields. Adv Neural Inf Process Syst 33:15651–15663

  • Liu S, Li T, Chen W et al (2019) Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7708–7717. https://doi.org/10.1109/ICCV.2019.00780

  • Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3D surface construction algorithm. ACM SIGGRAPH Comput Graph 21(4):163–169. https://doi.org/10.1145/37401.37422

    Article  Google Scholar 

  • Mandikal P, Radhakrishnan VB (2019) Dense 3D point cloud reconstruction using a deep pyramid network. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1052–1060. https://doi.org/10.1109/wacv.2019.00117

  • Mandikal P, Navaneet K, Agarwal M et al (2018) 3D-lmNET: latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. ar**v preprint. ar**v:1807.07796

  • Massey FJ Jr (1951) The kolmogorov-smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78. https://doi.org/10.2307/2280095

    Article  MATH  Google Scholar 

  • Meagher DJ (1980) Octree encoding: a new technique for the representation, manipulation and display of arbitrary 3-D objects by computer. Electrical and Systems Engineering Department, Rensseiaer Polytechnic, Troy

  • Mescheder L, Oechsle M, Niemeyer M et al (2019) Occupancy networks: Learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4460–4470. https://doi.org/10.1109/cvpr.2019.00459

  • Mildenhall B, Srinivasan PP, Tancik M et al (2020) NeRF: representing scenes as neural radiance fields for view synthesis. In: European conference on computer vision. Springer, Cham, pp 405–421

  • Murez Z, van As T, Bartolozzi J et al (2020) Atlas: end-to-end 3D scene reconstruction from posed images. In:16th European conference on computer vision—ECCV 2020, Glasgow, UK, 23–28 August 2020, Proceedings, Part VII 16. Springer, Cham, pp 414–431. https://doi.org/10.1007/978-3-030-58571-6_25

  • Pan J, Han X, Chen W et al (2019) Deep mesh reconstruction from single RGB images via topology modification networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9964–9973. https://doi.org/10.1109/iccv.2019.01006

  • Pan X, Dai B, Liu Z et al (2020) Do 2D GANS know 3D shape? Unsupervised 3D shape reconstruction from 2D image gans. ar**v preprint. ar**v:2011.00844

  • Park JJ, Florence P, Straub J et al (2019) DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 165–174. https://doi.org/10.1109/cvpr.2019.00025

  • Park K, Sinha U, Barron JT et al (2021) Nerfies: deformable neural radiance fields. In: Proceedings of the IEEE/CVf international conference on computer vision, pp 5865–5874

  • Pillai S, Ramalingam S, Leonard JJ (2016) High-performance and tunable stereo reconstruction. In: 2016 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3188–3195

  • Popov S, Bauszat P, Ferrari V (2020) CoreNet: coherent 3D scene reconstruction from a single RGB image. In: European conference on computer vision. Springer, Cham, pp 366–383. https://doi.org/10.1007/978-3-030-58536-5_22

  • Qi CR, Su H, Mo K et al (2017a) PointNet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660. https://doi.org/10.1109/cvpr.2017.16

  • Qi CR, Yi L, Su H et al (2017b) PointNet++: deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf Process Syst. ar**v preprint. ar**v:1706.02413v1

  • Saito S, Huang Z, Natsume R et al (2019) PIFU: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2304–2314

  • Saito S, Simon T, Saragih J et al (2020) PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 84–93

  • Salvi A, Gavenski N, Pooch E et al (2020) Attention-based 3D object reconstruction from a single image. In: 2020 International joint conference on neural networks (IJCNN). IEEE, pp 1–8. https://doi.org/10.1109/ijcnn48605.2020.9206776

  • Sarmad M, Lee HJ, Kim YM (2019) RL-GAN-Net : a reinforcement learning agent controlled gan network for real-time point cloud shape completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5898–5907. https://doi.org/10.1109/cvpr.2019.00605

  • Scarselli F, Gori M, Tsoi AC et al (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80. https://doi.org/10.1109/TNN.2008.2005605

    Article  Google Scholar 

  • Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4104–4113

  • Schops T, Schonberger JL, Galliani S et al (2017) A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3260–3269

  • Shin D, Fowlkes CC, Hoiem D (2018) Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3061–3069. https://doi.org/10.1109/cvpr.2018.00323

  • Shin D, Ren Z, Sudderth EB et al (2019) 3d scene reconstruction with multi-layer depth and epipolar transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2172–2182. https://doi.org/10.1109/iccv.2019.00226

  • Silberman N, Hoiem D, Kohli P et al (2012) Indoor segmentation and support inference from RGBD images. In: European conference on computer vision. Springer, Cham, pp 746–760. https://doi.org/10.1007/978-3-642-33715-4_54

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v preprint. ar**v:1409.1556

  • Sinha SN (2014) Multiview stereo. Springer, Boston, pp 516–522. https://doi.org/10.1007/978-0-387-31439-6_203

  • Song S, Yu F, Zeng A et al (2017) Semantic scene completion from a single depth image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1746–1754. https://doi.org/10.1109/cvpr.2017.28

  • Sun J, **e Y, Chen L et al (2021) NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15,598–15,607

  • Sun X, Wu J, Zhang X et al (2018) Pix3D: dataset and methods for single-image 3d shape modeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2974–2983. https://doi.org/10.1109/cvpr.2018.00314

  • Tatarchenko M, Dosovitskiy A, Brox T (2016) Multi-view 3d models from single images with a convolutional network. In: European conference on computer vision. Springer, Cham, pp 322–337. https://doi.org/10.1007/978-3-319-46478-7_20

  • Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of the IEEE international conference on computer vision, pp 2088–2096. https://doi.org/10.1109/iccv.2017.230

  • Tatarchenko M, Richter SR, Ranftl R et al (2019) What do single-view 3D reconstruction networks learn? In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3405–3414. https://doi.org/10.1109/cvpr.2019.00352

  • Tulsiani S, Gupta S, Fouhey DF et al (2018) Factoring shape, pose, and layout from the 2D image of a 3D scene. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 302–310. https://doi.org/10.1109/cvpr.2018.00039

  • Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  • Wallace B, Hariharan B (2019) Few-shot generalization for single-image 3D reconstruction via priors. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3818–3827. https://doi.org/10.1109/iccv.2019.00392

  • Wang D, Cui X, Chen X et al (2021a) Multi-view 3D reconstruction with transformer. ar**v preprint. ar**v:2103.12957

  • Wang F, Galliani S, Vogel C et al (2021b) PatchmatchNet: learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14194–14203

  • Wang N, Zhang Y, Li Z et al (2018a) Pixel2Mesh: generating 3D mesh models from single rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp 52–67. https://doi.org/10.1007/978-3-030-01252-6_4

  • Wang TC, Liu MY, Zhu JY et al (2018b) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807

  • Wen C, Zhang Y, Li Z et al (2019) Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1042–1051. https://doi.org/10.1109/iccv.2019.00113

  • Wiles O, Gkioxari G, Szeliski R et al (2020) SynSin: end-to-end view synthesis from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7467–7477. https://doi.org/10.1109/cvpr42600.2020.00749

  • Wu J, Zhang C, Xue T et al (2016) Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp 82–90

  • Wu Z, Song S, Khosla A et al (2015) 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920, https://doi.org/10.1109/cvpr.2015.7298801

  • **a W, Zhang Y, Yang Y et al (2022) GAN inversion: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3181070

  • **an W, Huang JB, Kopf J et al (2021) Space–time neural irradiance fields for free-viewpoint video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9421–9431

  • **ang P, Wen X, Liu YS et al (2021) SnowflakeNet: point cloud completion by snowflake point deconvolution with skip-transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5499–5509

  • **ang Y, Mottaghi R, Savarese S (2014) Beyond pascal: a benchmark for 3d object detection in the wild. In: IEEE winter conference on applications of computer vision, IEEE, pp 75–82, https://doi.org/10.1109/wacv.2014.6836101

  • **ang Y, Kim W, Chen W et al (2016) ObjectNet3D: a large scale database for 3D object recognition. In: European conference on computer vision. Springer, Cham, pp 160–176. https://doi.org/10.1007/978-3-319-46484-8_10

  • **e H, Yao H, Sun X et al (2019) Pix2Vox: context-aware 3D reconstruction from single and multi-view images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2690–2698. https://doi.org/10.1109/iccv.2019.00278

  • **e H, Yao H, Zhang S et al (2020) Pix2Vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. Int J Comput Vis 128(12):2919–2935. https://doi.org/10.1007/s11263-020-01347-6

    Article  Google Scholar 

  • Yao Y, Luo Z, Li S et al (2018) MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of the European conference on computer vision (ECCV), pp 767–783

  • Yao Y, Luo Z, Li S et al (2019) Recurrent MVSNet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5525–5534

  • Yao Y, Luo Z, Li S et al (2020) BlendedMVS: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1790–1799

  • Yu C (2019) Semi-supervised three-dimensional reconstruction framework with GAN. In: Proceedings of the 28th international joint conference on artificial intelligence, pp 4192–4198

  • Yu Z, Gao S (2020) Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and Gauss–Newton refinement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1949–1958

  • Zhang W, Yan Q, **ao C (2020) Detail preserved point cloud completion via separated feature aggregation. In: European conference on computer vision. Springer, Cham, pp 512–528

  • Zhao C, Sun L, Stolkin R (2017) A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition. In: 2017 18th International conference on advanced robotics (ICAR). IEEE, pp 75–82. https://doi.org/10.1109/icar.2017.8023499

  • Zhao H, Jiang L, Jia J et al (2021a) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16259–16268

  • Zhao M, **ong G, Zhou M et al (2021d) 3D-RVP: a method for 3D object reconstruction from a single depth view using voxel and point. Neurocomputing 430:94–103

    Article  Google Scholar 

  • Zheng Z, Yu T, Liu Y et al (2021) Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE transactions on pattern analysis and machine intelligence 44(6):3170–3184

    Article  Google Scholar 

  • Zhou X, Wang D, Krähenbühl P (2019) Objects as points. ar**v preprint. ar**v:1904.07850

  • Zou C, Hoiem D (2020) Silhouette guided point cloud reconstruction beyond occlusion. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 41–50. https://doi.org/10.1109/WACV45572.2020.9093611

Download references


The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mohsen Soryani.

Ethics declarations

Conflict of interest

The authors confirm that there is no conflict of interest in publishing this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Samavati, T., Soryani, M. Deep learning-based 3D reconstruction: a survey. Artif Intell Rev 56, 9175–9219 (2023). https://doi.org/10.1007/s10462-023-10399-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10399-2

