3D objects reconstruction from frontal images: an example with guitars

Beacco, Alejandro; Gallego, Jaime; Slater, Mel

doi:10.1007/s00371-022-02669-x

3D objects reconstruction from frontal images: an example with guitars

Original article
Open access
Published: 15 September 2022

Volume 39, pages 5421–5436, (2023)
Cite this article

Download PDF

You have full access to this open access article

The Visual Computer Aims and scope Submit manuscript

3D objects reconstruction from frontal images: an example with guitars

Download PDF

1847 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

This work deals with the automatic 3D reconstruction of objects from frontal RGB images. This aims at a better understanding of the reconstruction of 3D objects from RGB images and their use in immersive virtual environments. We propose a complete workflow that can be easily adapted to almost any other family of rigid objects. To explain and validate our method, we focus on guitars. First, we detect and segment the guitars present in the image using semantic segmentation methods based on convolutional neural networks. In a second step, we perform the final 3D reconstruction of the guitar by war** the rendered depth maps of a fitted 3D template in 2D image space to match the input silhouette. We validated our method by obtaining guitar reconstructions from real input images and renders of all guitar models available in the ShapeNet database. Numerical results for different object families were obtained by computing standard mesh evaluation metrics such as Intersection over Union, Chamfer Distance, and the F-score. The results of this study show that our method can automatically generate high-quality 3D object reconstructions from frontal images using various segmentation and 3D reconstruction techniques.

3D Shape Segmentation with Geometric Deep Learning

CHORE: Contact, Human and Object Reconstruction from a Single RGB Image

CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Methods for reconstructing 3D objects from 2D images and videos have undergone remarkable improvements in recent years. In general, these proposals use specific databases for each object type, although there is a trend toward develo** general methods that compute 3D reconstruction for each object type [11] are used to segment guitars from images. Encoder–decoder networks usually consist of two phases: First, the feature maps are reduced to capture the semantic information; then, the spatial information is recovered by upsampling techniques. This approach has proven successful in segmentation [6, 7, 12, 13]. The Xception module, which modifies Inception V3 to improve performance on large data sets, is now used as the main backbone in server environments [5]. In encoder architectures, low-resolution features are separated from higher-resolution ones and recovered using the decoder. According to another approach, high-resolution representations should be maintained throughout the process by using a parallel network that connects the parts of the process and helps to reconstruct these features at the end. Recent work on high-resolution networks (HRNet) [14, 15] has shown very good performance.

2.2 3D object reconstruction

RGB image-based 3D reconstruction methods using convolutional neural networks (CNNs) have attracted increasing interest and shown impressive performance. Han et al. provided an overview of these methods in [16]. Hepperle et al. examined the quality of 3D reconstruction quality to enhance the experience in VR applications in [17].

The development of deep learning techniques, and in particular, the increasing availability of large training data sets, has led to a new generation of methods capable of recovering the 3D geometry and structure of objects from one or more RGB images without the complex process of camera calibration.

2.2.1 Volumetric representation techniques

Volumetric representations partition the space around a 3D object into a 3D grid and allow existing deep learning architectures developed for 2D image analysis, such as encoder–decoders, to be used for 3D processes.

Some methods deal with 3D volume reconstruction in the form of a voxelized occupancy grid, for example, in [18,19,3.1 Guitar segmentation and classification

We defined a set of segmentation and classification methods to extract the information of the guitar region appearing in the image and to verify that the guitar to be reconstructed meets the minimum processing requirements. We follow a framework of weak classifiers that can be combined sequentially to simplify the creation of the databases and their generalization to various other objects.

The proposed method starts with the segmentation of the guitar from the image, and then, a chain of classifiers and segmentation methods is applied to this first segmentation: We classify the segmented guitar into frontal/non-frontal classes to check whether the guitar is frontal to the camera. If the classification reveals that the guitar is frontal, the process continues with a second classifier that detects whether the guitar is electric or classical. This determines the type of template needed to correctly fit and reconstruct the guitar model. Finally, another segmentation is performed to extract the detected regions of the classical/electric guitar. This segmentation is used to align the 3D template with the orientation of the guitar and improve edge fitting during 3D reconstruction.

3.1.1 Guitar/non-guitar segmentation

To obtain a correct 3D reconstruction, an accurate segmentation of the guitar is required. We use the database and segmentation presented in [9] with 2, 200 RGB images of guitars (11,000 images after enhancement) to train and test the selected network. We randomly select 80% of the original data for training and the remaining 20% for testing. This database is also used for the classification methods explained in the following sections.

To obtain the best segmentation, we performed a full evaluation for three of the best CNNs for segmentation: Deeplabv3+ [11], where each CNN was trained from scratch with 40,000 iterations.

Table 1 Comparison on evaluation sets applying 40K training iterations

Full size table

The performance of all networks with 40K iterations is shown in Table 1. As we can see, DeepLabv3+, HRNet and PGN achieved Mean Intersection Over Union (mIoU) of 88.47%, 95.31% and 96.61%, respectively. Figure 2a shows an example of the guitar segmentation achieved.

Therefore, in our implementation, the PGN network is chosen to obtain a high-quality object segmentation. In Fig. 11, second column, we can see examples of guitar segmentation results obtained with this network.

3.1.2 Frontal/non-frontal guitar classification

To detect whether the guitar segmented in the previous step is frontal enough to be processed by our method, we developed a frontal/non-frontal classifier based on CNNs.

We use the guitar segmentation obtained in the previous step, cut into a square block with a black background, as input to our classifier to determine whether the guitar image is frontal or non-frontal. We use a CNN reference model that has shown correct classification results based on its appearance: ResNet50, a 50-layer residual network with correct performance on classification tasks [34]. The final number of samples per class after data augmentation was 123,625 frontal images and 151,000 non-frontal images.

Figure 2b shows an example of the images used in this classification process.

We adapted the database to the ResNet50 model and trained it on a GPU NVIDIA Titan X with 24 GB, with 6 epochs, a batch size of 16, stochastic gradient descent optimizer, and a learning rate of $10^{-4}$. With this configuration, this CNN achieved a classification accuracy of 99.4%.

3.1.3 Classical/electric guitar classification

In our proposal, we use a different 3D template for classical and electric guitars to better fit the system to the actual shape of the instrument. Thus, we need to determine what type of guitar it is so that we can apply the correct template. From the 989 frontal guitars extracted in Sect. 3.1.2, we obtained 470 and 519 classical and electric guitars, respectively, of which 80% were randomly selected for training and the remaining 20% for testing. We then augmented them obtaining 58,750 classical and 64,875 electric guitars. Figure 2c shows sample images from this dataset. ResNet50 was trained with the same configuration as in Sect. 3.1.3, obtaining 98.3% accuracy.

3.1.4 Regions segmentation

This step is used to match the 3D template with the parts of the object, so that each region can be correctly located and placed when reconstructing the final 3D model. For each guitar type, we define different regions:

Classical guitar, five regions: Head, Neck, Body, Bridge and Hole.
Electric guitar, six regions: Head, Neck, Body, Bridge, Pickups and Controls.

Figure 2d shows a graphical representation of these regions.

To identify these regions in the segmented guitar, we use the PGN model [37, 38]. Since our templates are symmetric about the YZ plane, we do not need to check for a mirror transformation.

3.2.2 Boundary matching

The template is not a complete reconstruction of the model we are dealing with, but a rough approximation of the shape of one. Therefore, after aligning the template and the input silhouettes, we still need to find a boundary matching and perform silhouette war** in order to obtain any shape from the entire spectrum of possible shapes.

We need to find a boundary matching $\omega $ between the silhouettes of our guitar template and the input guitar (see Fig. 4a). Given the contour of the segmented guitar $\beta _g$, the pixels $p_g \in \beta _g$ belonging to this contour, the contour of our template $\beta _t$ and the pixels $p_t \in \beta _t$ belonging to this contour, we want to warp $\beta _t$ to its counterpart $\beta _g$ to match the template to the real shape of the object. We are looking for a map** $\omega $ that defines the correspondence between the pixels belonging to $\beta _g$ and $\beta _{t,\omega }$ by minimizing the distance between all the associated pixels of the contour of the template and the real contour of the segmented guitar:

$$\begin{aligned} \mathrm{arg min}_{\omega [0],...,\omega [m-1]} \sum _{i=0}^{m-1} \Vert (p_{g,i},p_{t,\omega [i]})\Vert _2 +\sigma (\omega [i],\omega [i+1]), \end{aligned}$$

(1)

where m is the number of pixels of the contour $\beta _t$ and

$$\begin{aligned} \sigma (\omega [i],\omega [i+1]) = \left\{ \begin{array}{l} 1, \quad \,\,\text {if}\,\,\, 0\le \omega [i+1]-\omega [i] \le k \\ \infty , \quad \text {otherwise} \end{array}\right. \end{aligned}$$

(2)

Therefore, $\sigma (\omega [i],\omega [i+1])$ penalizes jumps between associations larger than k pixels. In our implementation, $k = 128$ leads to correct results, but this is closely related to the working resolution we use (at most 350 $\times $ 350).

Depending on the value of k and the shape of the guitar, bad associations may occur, for example, when a pixel of the guitar’s input neck is matched to the guitar template’s body (see Fig. 4b).

To solve this problem, we use the computed segmented regions. Boundary matching $\omega _R$ is then computed for the individual masks of each region R (with a smaller constraint $K = 32$), and the resulting map**s can be combined. Each pixel $p_g$ of the original silhouette $\beta _g$ also belongs to the boundary $\beta _{g,R}$ of at least one segmented region R, but a mapped pixel $p_{t,R} \in \beta _{t,\omega }$ may or may not belong to the original silhouette of the template $\beta _t$. We therefore keep those that belong to $\beta _t$, obtaining an initial map** $\omega _\mathrm{init}$ whose gaps can be easily filled. Let u and v be two indices of $\beta _g$ that have a map** $\omega _\mathrm{init}[u]$ and $\omega _\mathrm{init}[v]$ belonging to $\beta _t$, e.g., $u < v$ and $s = v - u$. We find a map** point in $\beta _t$ proportional to all indices between u and v by decomposing the segment of $\beta _t$ between $\omega [u]$ and $\omega [v]$ into s parts.

3.2.3 Occlusions

To solve the possible occlusions that musicians can create on guitars, we need to find an occlusion mask that indicates which parts of the guitar are occluded, but we also need to reconstruct the occluded parts of the boundary to get a reconstructed guitar mask. Finally, the map of the segmented regions should also be extended to cover the reconstructed mask.

Occlusion mask We use PGN [40] by using the boundary pixels of each region and their matching pixels from the template (which are in turn computed using the boundary matching algorithm from Sect. 3.2.2) as pivots.

3.2.5 Meshing

We unproject each pixel of the warped depth maps to obtain the corresponding 3D vertex. Since war** the 2D silhouette can change the X and Y dimensions (making the silhouette larger or smaller in 2D), we scale the Z dimension accordingly to maintain the proportions of the guitar in all dimensions.

We create two triangles for each square of 4 pixels and get two meshes: one frontal and one posterior, which we stitch over the silhouette (see Fig. 8).

Finally, the entire mesh is smoothed using Laplacian smoothing.

3.2.6 Texture

The texture of the model can be obtained by directly projecting the texture of the masked guitar onto the front mesh. Therefore, the quality of the texture of the 3D model and its details are preserved from the original image. There are several aspects to consider in this process. First, we need to inpaint the input color image with the occlusion mask and the segmented regions in an occlusion. To do this, we fill each region of the occlusion mask by taking the largest possible patch from the unoccluded parts of the same region in the original color image. With such a patch, we synthesize a texture that covers the corresponding region of the occluded mask (using [41]), dilate it and paste it smoothly into the original occluded image. In this way, for example, guitar body is inpainted using only patches of the body. Figure 9 shows an example of this process.

We limit our texture synthesis approach to regions where we can find a sufficiently large patch (between $60\times 60$ and $100\times 100$ pixels), and otherwise use Exemplar Inpainting [42, 43].

The resulting inpainted texture is then projected onto the front mesh as a color texture. For the back texture, we use a similar strategy for inpainting the occluded parts: We use [41] to synthesize a texture that covers the entire silhouette using the largest possible patch in the guitar body region of the front texture. Thus, we assume that the back of each guitar has the same color and texture as the body. Figure 10 shows several examples of back textures.

When stitching the front and back meshes, we also ensure that the corresponding front and back boundary vertices have the same UV coordinates in the final texture. This results in faces that map to the boundary of the texture as if we were stretching those pixels.

To add additional detail and relief, we also compute a bump map from the resulting front and back textures (we compute the horizontal and vertical derivatives of the grayscale textures and multiply them by a strength factor). All these texture operations are performed at the same resolution as the original input image to preserve the maximum texture quality.

4 Results and evaluation

The proposed system has been tested to evaluate its quality and numerical performance compared to other reference methods.

Figure 11 shows some results using some images from the Internet, while Fig. 12 compares some resulting models with the corresponding ground truth when renders of these models are used as input (those from the ShapeNet database [11 shows some examples in row 4, where the lower part of the guitar is not reconstructed correctly because it is occluded by the grass, and in row 7, where both hands occlude the guitar and the final reconstruction resolves these areas incompletely. If the segmentation of the guitar fails by inserting an element of the scenario into the object mask, this element can be inserted into the final 3D model. This is the case shown in Fig. 11 row 3, where the neck of the guitar is not segmented correctly and the support is inserted into the reconstruction mask.

In terms of computational cost, our implementation performs each of the steps explained in this work sequentially. Using a Windows 10 PC with an AMD Ryzen 7 3700X 8-core processor, 32 GB RAM and an Nvidia RTX 2700 GPU, our setup can generate the 3D model of a guitar from an image in about 2 min. This runtime could be optimized by parallelization and code optimization techniques.

6 Conclusions

In this paper, we presented a complete system for 3D reconstruction of objects in frontal RGB images based on template deformation focusing on guitars to explain the method. It allows realistic 3D reconstruction in shape and texture and solves possible occlusions that can hide some parts of the object.

Unlike other reference methods, we work with both shape and texture and take into account occlusions present in the images. Therefore, the 3D models of our reconstructed guitars are accurate and realistic and can be used in 3D virtual reconstructions. Moreover, we have shown that our pipeline can be adapted to other objects, provided that a suitable 3D template and specific segmentation and classification techniques are used. Compared to other reference methods based mainly on CNNs, our proposal simplifies the 3D reconstruction process by requiring less data and training to obtain a realistic reconstruction of 3D objects.

For future improvements, we plan to address 3D reconstruction from other viewpoints and multiview configurations and to conduct a perceptual study to validate our reconstructions in a virtual environment. In summary, we believe that the work presented in this paper is a step toward automatic and realistic 3D object reconstruction and will be useful in creating 3D content for virtual reality.

References

Wen, C., Zhang, Y. Li, Z., Fu, Y.: Pixel2mesh++: Multi-view 3d mesh generation via deformation. In: IEEE Int. Conf. on Computer Vision, (2019), pp. 1042–1051. ar**v:1908.01491
Gkioxari, G., Malik, J., Johnson, J.: Mesh r-cnn (2020). ar**v:1906.02739
Beacco, A., Oliva, R., Cabreira, C., Gallego, J., Slater, M.: Disturbance and plausibility in a virtual rock concert: a pilot study. In: 2021 IEEE virtual reality and 3D user interfaces (VR), (2021), pp. 538–545. https://doi.org/10.1109/VR50410.2021.00078
Kilteni, K., Bergstrom, I., Slater, M.: Drumming in immersive virtual reality: the body shapes the way we play. IEEE Trans. Vis. Comput. Graphics 19(4), 597–605 (2013). https://doi.org/10.1109/TVCG.2013.29
Article Google Scholar
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conf. on computer vision (2018). ar**v:1802.02611
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation, CoRR abs/1505.04597. ar**v:1505.04597
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation, CoRR abs/1511.00561. ar**v:1511.00561
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. ar**v:1610.02357
Tono, I., Gallego, J., Swiderska-Chadaj, Z., Slater, M.: Guitar segmentation in rgb images using convolutional neural networks, in: IEEE Int, Conf. on computational problems of electrical engineering, (2020), pp. 1–4. https://doi.org/10.1109/CPEE50798.2020.9238720
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPR.2017.544
Article Google Scholar
Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grou** network. In: European conf. on computer vision, (2018), pp. 770–785. ar**v:1808.00157
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE conf. on computer vision and pattern recognition, (2015), pp. 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition, ar**v e-prints (2014) ar**v:1409.1556
Sun, K., Zhao, Y., Jiang, B., Cheng, T., **ao, B., Liu, D., Mu, Y., Wang, X., Liu, W., Wang, J.: High-resolution representations for labeling pixels and regions. ar**v:1904.04514
K. Sun, B. **ao, D. Liu, J. Wang, Deep high-resolution representation learning for human pose estimation, CoRR ar**v:1902.09212
Han, X.F., Laga, H., Bennamoun, M.: Image-based 3d object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1578–1604 (2019). https://doi.org/10.1109/TPAMI.2019.2954885
Article Google Scholar
Hepperle, D., Purps, C.F., Deuchler, J., Wölfel, M.: Aspects of visual avatar appearance: self-representation, display type, and uncanny valley. Vis. Comput. 38(4), 1227–1244 (2022)
Article Google Scholar
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction, in: European conf. on computer vision, Springer, (2016), pp. 628–644. https://doi.org/10.1007/978-3-319-46484-8_38
Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: Proceedings of the 30th international conference on neural information processing systems, (2016), pp. 82–90. https://doi.org/10.5555/3157096.3157106
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: European conference on computer vision, Springer, (2016), pp. 484–499. ar**v:1603.08637
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: IEEE/CVF conference on computer vision and pattern recognition, (2019), pp. 4460–4470. https://doi.org/10.1109/CVPR.2019.00459
Kato, H., Beker, D., Morariu, M., Ando, T., Matsuoka, T., Kehl, W., Gaidon, A.: Differentiable rendering: a survey. ar**v:2006.12057
Jiang, Y., Ji, D., Han, Z., Zwicker, M.: Sdfdiff: Differentiable rendering of signed distance fields for 3d shape optimization. In: IEEE Conf. on computer vision and pattern recognition, (2020), pp. 1251–1261. ar**v:1912.07109
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3d-structure-aware neural scene representations, ar**v preprint ar**v:1906.01618
Li, Y., Su, H., Qi, C.R., Fish, N., Cohen-Or, D., Guibas, L.J.: Joint embeddings of shapes and images via CNN image purification. Trans. graphics 34(6), 1–12 (2015). https://doi.org/10.1145/2816795.2818071
Article Google Scholar
Tulsiani, S., Kar, A., Carreira, J., Malik, J.: Learning category-specific deformable 3d models for object reconstruction. Trans. Pattern Anal. Mach. Intell. 39(4), 719–731 (2016). https://doi.org/10.1109/TPAMI.2016.2574713
Article Google Scholar
Kong, C., Lin, C.H., Lucey, S.: Using locally corresponding cad models for dense 3d reconstructions from a single image. In: IEEE conf. on computer vision and pattern recognition, (2017), pp. 4857–4865. https://doi.org/10.1109/CVPR.2017.594
Pontes, J.K., Kong, C., Eriksson, A., Fookes, C., Sridharan, S., Lucey, S.: Compact model representation for 3d reconstruction. In: Int. Conf. on 3D Vision, (2017), pp. 88–96. ar**v:1707.07360
Zou, Q.-F., Liu, L., Liu, Y.: Instance-level 3d shape retrieval from a single image by hybrid-representation-assisted joint embedding. Vis. Comput. 37(7), 1743–1756 (2021)
Article Google Scholar
Pan, J., Han, X., Chen, W., Tang, J., Jia K.: Deep mesh reconstruction from single rgb images via topology modification networks. In: IEEE int. conf. on computer vision, (2019), pp. 9964–9973. ar**v:1909.00321
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3d mesh models from single rgb images. In: European conf. on computer vision (ECCV), (2018), pp. 52–67. ar**v:1804.01654
Güler, R., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: IEEE CVPR 2018 Papers (2018). ar**v:1802.00434
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE conf. on computer vision and pattern recognition, (2016), pp. 770–778. ar**v:1512.03385
Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graphics Appl. 21(5), 34–41 (2001). https://doi.org/10.1109/38.946629
Article Google Scholar
Weng, C., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3d character animation from a single photo. In: IEEE conf. on computer vision and pattern recognition, (2019), pp. 5908–5917. ar**v:1812.02246
Beacco, A., Gallego, J., Slater, M.: Automatic 3d character reconstruction from frontal and lateral monocular 2d rgb views. In: IEEE int. conf. on image processing (2020), pp. 2785–2789. https://doi.org/10.1109/ICIP40778.2020.9191091
Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33(2), 1134 (1986). https://doi.org/10.1103/PhysRevA.33.1134
Article MathSciNet MATH Google Scholar
Wells, W.M., III., Viola, P., Atsumi, H., Nakajima, S., Kikinis, R.: Multi-modal volume registration by maximization of mutual information. Med. Image Anal. 1(1), 35–51 (1996). https://doi.org/10.1016/S1361-8415(01)80004-9
Article Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001). https://doi.org/10.1109/34.969114
Article Google Scholar
Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, Association for Computing Machinery, New York, NY, USA, (2006), p. 533–540. https://doi.org/10.1145/1179352.1141920
Opara, A., Stachowiak, T.: More like this, please! texture synthesis and remixing from a single example. (2019). https://github.com/EmbarkStudios/texture-synthesis
Criminisi, A., Perez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process 13(9), 1200–1212 (2004). https://doi.org/10.1109/TIP.2004.833105
Article Google Scholar
Le Meur, O., Ebdelli, M., Guillemot, C.: Hierarchical super-resolution-based inpainting. IEEE Trans. Image Process 22(10), 3779–3790 (2013). https://doi.org/10.1109/TIP.2013.2261308
Article MathSciNet MATH Google Scholar
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: an information-rich 3d model repository, ar**v ar**v:1512.03012
Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3d reconstruction networks learn? (2019). ar**v:1905.03678
Knapitsch, A., Park, J., Zhou, Q., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 10(1145/3072959), 3073599 (2017)
Google Scholar
Besl, P., McKay, N.D.: A method for registration of 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992). https://doi.org/10.1109/34.121791
Article Google Scholar
Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image (2016). ar**v:1612.00603
Groueix, T., Fisher, M., Kim, V., Russell, B., Aubry, M.: Atlasnet: A papier-mâché approach to learning 3d surface generation. ar**v:1802.05384
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. ar**v:1703.09438
Richter, S., Roth, S.: Matryoshka networks: Predicting 3d geometry via nested shape layers. ar**v:1804.10975

Download references

Acknowledgements

This work is funded by the European Research Council (ERC) Advanced Grant Moments in Time in Immersive Virtual Environments (MoTIVE) Number 742989.

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Author information

Alejandro Beacco and Jaime Gallego have contributed equally.

Authors and Affiliations

EventLab, Universitat de Barcelona, Barcelona, Spain
Alejandro Beacco, Jaime Gallego & Mel Slater

Authors

Alejandro Beacco
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Gallego
View author publications
You can also search for this author in PubMed Google Scholar
Mel Slater
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaime Gallego.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Beacco, A., Gallego, J. & Slater, M. 3D objects reconstruction from frontal images: an example with guitars. Vis Comput 39, 5421–5436 (2023). https://doi.org/10.1007/s00371-022-02669-x

Download citation

Accepted: 01 September 2022
Published: 15 September 2022
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00371-022-02669-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

3D objects reconstruction from frontal images: an example with guitars

Abstract

Similar content being viewed by others

3D Shape Segmentation with Geometric Deep Learning

CHORE: Contact, Human and Object Reconstruction from a Single RGB Image

CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image