Abstract
Semantic Scene Completion (SSC) aims to jointly infer semantics and occupancies of 3D scenes. Truncated Signed Distance Function (TSDF), a 3D encoding of depth, has been a common input for SSC. Furthermore, RGB-TSDF fusion, seems promising since these two modalities provide color and geometry information, respectively. Nevertheless, RGB-TSDF fusion has been considered nontrivial and commonly-used naive addition will result in inconsistent results. We argue that the inconsistency comes from the sparsity of RGB features upon projecting into 3D space, while TSDF features are dense, leading to imbalanced feature maps when summed up. To address this RGB-TSDF distribution difference, we propose a two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas. Moreover, we propose an effective classwise entropy loss function to punish inconsistency. Extensive experiments on public datasets verify that our method achieves state-of-the-art performance among methods that do not adopt extra data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cai, Y., Chen, X., Zhang, C., Lin, K.Y., Wang, X., Li, H.: Semantic scene completion via integrating instances and scene in-the-loop. In: CVPR, pp. 324–333 (2021)
Chen, X., Lin, K.Y., Qian, C., Zeng, G., Li, H.: 3D sketch-aware semantic scene completion via semi-supervised structure prior. In: CVPR, pp. 4193–4202 (2020)
Chen, X., **ng, Y., Zeng, G.: Real-time semantic scene completion via feature aggregation and conditioned prediction. In: ICIP, pp. 2830–2834. IEEE (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)
Dourado, A., de Campos, T.E., Kim, H., Hilton, A.: Edgenet: semantic scene completion from RGB-D images. ar**v preprint ar**v:1908.02893 1 (2019)
Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: CVPR, pp. 5431–5440 (2016)
Fu, R., Wu, H., Hao, M., Miao, Y.: Semantic scene completion through multi-level feature fusion. In: IROS, pp. 8399–8406. IEEE (2022)
Garbade, M., Chen, Y.T., Sawatzky, J., Gall, J.: Two stream 3D semantic scene completion. In: CVPRW (2019)
Guedes, A.B.S., de Campos, T.E., Hilton, A.: Semantic scene completion combining colour and depth: preliminary experiments. ar**v preprint ar**v:1802.04735 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hsu, Y.C., Kira, Z.: Neural network-based clustering using pairwise constraints. ar**v preprint ar**v:1511.06321 (2015)
Karim, M.R., et al.: Deep learning-based clustering approaches for bioinformatics. Brief. Bioinform. 22(1), 393–415 (2021)
Li, J., Ding, L., Huang, R.: Imenet: Joint 3D semantic scene completion and 2d semantic segmentation through iterative mutual enhancement. In: IJCAI, pp. 793–799 (2021)
Li, J., et al.: RGBD based dimensional decomposition residual network for 3d semantic scene completion. In: CVPR, pp. 7693–7702 (2019)
Li, J., Song, Q., Yan, X., Chen, Y., Huang, R.: From front to rear: 3D semantic scene completion through planar convolution and attention-based network. IEEE Transactions on Multimedia (2023)
Liu, S., et al.: See and think: Disentangling semantic scene completion. In: NIPS 31 (2018)
Park, S.J., Hong, K.S., Lee, S.: Rdfnet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In: ICCV, pp. 4980–4989 (2017)
Robinson, D.W.: Entropy and uncertainty. Entropy 10(4), 493–506 (2008)
Roldao, L., De Charette, R., Verroust-Blondet, A.: 3D semantic scene completion: a survey. In: IJCV, pp. 1–28 (2022)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: CVPR, pp. 1746–1754 (2017)
Tang, J., Chen, X., Wang, J., Zeng, G.: Not all voxels are equal: semantic scene completion from the point-voxel perspective. In: AAAI, vol. 36, pp. 2352–2360 (2022)
Wang, X., Lin, D., Wan, L.: Ffnet: Frequency fusion network for semantic scene completion. In: AAAI. vol. 36, pp. 2550–2557 (2022)
Wang, Y., Zhou, W., Jiang, T., Bai, X., Xu, Y.: Intra-class feature variation distillation for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 346–362. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_21
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: Rignet: repetitive image guided network for depth completion. In: ECCV, pp. 214–230. Springer (2022). https://doi.org/10.1007/978-3-031-19812-0_13
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: Desnet: decomposed scale-consistent network for unsupervised depth completion. In: AAAI, vol. 37, pp. 3109–3117 (2023)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: CVPR, pp. 1857–1866 (2018)
Zhang, P., Liu, W., Lei, Y., Lu, H., Yang, X.: Cascaded context pyramid for full-resolution 3D semantic scene completion. In: ICCV, pp. 7801–7810 (2019)
Acknowledgment
This work was partially supported by Shenzhen Science and Technology Program (JCYJ20220818103006012, ZDSYS20211021111415025), Shenzhen Institute of Artificial Intelligence and Robotics for Society, and the Research Foundation of Shenzhen Polytechnic University (6023312007K).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ding, L., Hu, P., Li, J., Huang, R. (2024). Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_11
Download citation
DOI: https://doi.org/10.1007/978-981-99-8432-9_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)