Abstract
Designing efficient deep learning models for 3D point clouds is an important research topic. Point-voxel convolution (Liu et al. in NeurIPS, 2019) is a pioneering approach in this direction, but it still has considerable room for improvement in terms of performance, since it has quite a few layers of simple 3D convolutions and linear point-voxel feature fusion operations. To resolve these issues, we propose a novel reparameterizable point-voxel convolution (RepPVConv) block. First, RepPVConv adopts two reparameterizable 3D convolution modules to extract more informative voxel features without introducing any extra computational overhead for inference. The rationale is that the reparameterizable 3D convolution modules are trained in high-capacity modes but are reparameterized into low-capacity modes during inference while losslessly maintaining the original performance. Second, RepPVConv attentively fuses the reparameterized voxel features with those of points. Since the proposed approach operates in a nonlinear manner, descriptive reparameterized voxel features can be better utilized. Extensive experimental results show that RepPVConv-based networks are efficient in terms of both GPU memory consumption and computational complexity and significantly outperform the state-of-the-art methods.
Similar content being viewed by others
References
Armeni, I., Sener, O., Zamir, A.R. et al.: 3d semantic parsing of large-scale indoor spaces. In: CVPR, pp. 1534–1543 (2016)
Armeni, I., Sax, S., Zamir, A.R., et al.: Joint 2d-3d-semantic data for indoor scene understanding. ar**v preprint ar**v:1702.01105 (2017)
Bronstein, M.M., Bruna, J., LeCun, Y., et al.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)
Chang, A.X., Funkhouser, T., Guibas, L., et al.: Shapenet: an information-rich 3d model repository. ar**v preprint ar**v:1512.03012 (2015)
Chen, L., Zhang, Q.: Ddgcn: graph convolution network based on direction and distance for point cloud learning. Vis. Comput. 1–11 (2022) https://doi.org/10.1007/s00371-021-02351-8
Chen, Y., Peng, W., Tang, K., et al.: Pyrapvconv: efficient 3d point cloud perception with pyramid voxel convolution and sharable attention. Comput. Intell. Neurosci. 2022:1–9 https://doi.org/10.1155/2022/2286818
Choy, C., Gwak, J., Savarese, S.: 4d spatio-temporal convnets: Minkowski convolutional neural networks. In: CVPR, pp. 3075–3084 (2019)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., et al.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: MICCAI, pp. 424–432 (2016)
Ding, X., Zhang, X., Ma, N., et al.: Repvgg: making vgg-style convnets great again. In: CVPR (2021)
Engelcke, M., Rao, D., Wang, D.Z., et al.: Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: ICRA, pp. 1355–1361. IEEE (2017)
Geiger, A., Lenz, P., Stiller, C., et al.: Vision meets robotics: the kitti dataset. IJRR 32(11), 1231–1237 (2013)
Graham, B., Engelcke, M., Van Der Maaten, L.: 3d semantic segmentation with submanifold sparse convolutional networks. In: CVPR, pp. 9224–9232 (2018)
Guo, Y., Bennamoun, M., Sohel, F., et al.: 3d object recognition in cluttered scenes with local surface features: a survey. IEEE TPAMI 36(11), 2270–2287 (2014)
Guo, Y., Wang, H., Hu, Q., et al.: Deep learning for 3d point clouds: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)
Hu, Q., Yang, B., **e, L., et al.: Randla-net: efficient semantic segmentation of large-scale point clouds. In: CVPR, pp. 11,108–11,117 (2020)
Ioannidou, A., Chatzilari, E., Nikolopoulos, S., et al.: Deep learning advances in computer vision with 3d data: a survey. ACM Comput. Surv. (CSUR) 50(2), 1–38 (2017)
Kingma, D.P., Welling, M., et al.: An introduction to variational autoencoders. Found. Trends® Mach. Learn. 12(4), 307–392 (2019)
Li, B.: 3d fully convolutional network for vehicle detection in point cloud. In: IROS, pp. 1513–1518 (2017)
Li, Y., Bu, R., Sun, M., et al.: Pointcnn: convolution on x-transformed points. In: NeurIPS, pp. 820–830 (2018)
Lin, N., Li, Y., Tang, K., et al.: Manipulation planning from demonstration via goal-conditioned prior action primitive decomposition and alignment. IEEE Robot. Autom. Lett. 7(2), 1387–1394 (2022)
Liu, Z., Tang, H., Lin, Y., et al.: Point-voxel CNN for efficient 3d deep learning. In: NeurIPS (2019)
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: IROS, pp. 922–928 (2015)
Noh, J., Lee, S., Ham, B.: Hvpr: hybrid voxel-point representation for single-stage 3d object detection. In: CVPR, pp. 14,605–14,614 (2021)
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8026–8037 (2019)
Qi, C.R., Su, H., Mo, K., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: CVPR, pp. 652–660 (2017a)
Qi, C.R., Yi, L., Su, H., et al.: Pointnet++ deep hierarchical feature learning on point sets in a metric space. In: NeurIPS, pp. 5105–5114 (2017b)
Qi, C.R., Liu, W., Wu, C., et al. Frustum pointnets for 3d object detection from rgb-d data. In: CVPR, pp. 918–927 (2018)
Que, Z., Lu, G., Xu, D.: Voxelcontext-net: an octree based framework for point cloud compression. In: CVPR, pp. 6042–6051 (2021)
Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: learning deep 3d representations at high resolutions. In: CVPR, pp. 3577–3586 (2017)
Shi, S., Guo, C., Jiang, L., et al.: PV-RCNN: point-voxel feature set abstraction for 3d object detection. In: CVPR, pp. 10,529–10,538 (2020)
Shi, S., Jiang, L., Deng, J., et al.: PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3d object detection. ar**v preprint ar**v:2102.00463 (2021)
Shi, W., Rajkumar, R.: Point-GNN: graph neural network for 3d object detection in a point cloud. In: CVPR, pp. 1711–1719 (2020)
Song, S., **ao, J.: Sliding shapes for 3d object detection in depth images. In: ECCV, pp. 634–651 (2014)
Song, S., **ao, J.: Deep sliding shapes for amodal 3d object detection in rgb-d images. In: CVPR, pp. 808–816 (2016)
Su, H., Maji, S., Kalogerakis, E., et al.: Multi-view convolutional neural networks for 3d shape recognition. In: ICCV, pp. 945–953 (2015)
Sun, Y., Miao, Y., Chen, J., et al.: Pgcnet: patch graph convolutional network for point cloud segmentation of indoor scenes. Vis. Comput. 36(10), 2407–2418 (2020)
Tang, H., Liu, Z., Zhao, S., et al.: Searching efficient 3d architectures with sparse point-voxel convolution. In: ECCV, pp. 685–702 (2020)
Tang, K., Ma, Y., Miao, D., et al.: Decision fusion networks for image classification. IEEE Trans. Neural Netw. Learn. Syst. (2022) https://doi.org/10.1109/TNNLS.2022.3196129
Thomas, H., Qi, C.R., Deschaud, J.E., et al.: Kpconv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6411–6420 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NeurIPS (2017)
Veit, A., Wilber, M., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: NeurIPS, pp. 550–558 (2016)
Wang, P.S., Liu, Y., Guo, Y.X., et al.: O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM TOG 36(4), 1–11 (2017)
Wang, P.S., Sun, C.Y., Liu, Y., et al.: Adaptive o-cnn: a patch-based deep representation of 3d shapes. ACM TOG 37(6), 1–11 (2018)
Wang, Y., Sun, Y., Liu, Z., et al.: Dynamic graph cnn for learning on point clouds. ACM TOG (SIGGRAPH) 38(5), 1–12 (2019)
Wei, Y., Wang, Z., Rao, Y., et al.: Pv-raft: point-voxel correlation fields for scene flow estimation of point clouds. In: CVPR, pp. 6954–6963 (2021)
Wu, W., Qi, Z., Fuxin, L.: Pointconv: deep convolutional networks on 3d point clouds. In: CVPR, pp. 9621–9630 (2019)
Wu, Z., Song, S., Khosla, A., et al.: 3d shapenets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)
Xu, M., Ding, R., Zhao, H., et al.: Paconv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR (2021)
Zagoruyko, S., Komodakis, N.: Diracnets: Training very deep neural networks without skip-connections. ar**v preprint ar**v:1706.00388 (2017)
Zhang, F., Fang, J., Wah, B.W., et al.: Deep fusionnet for point cloud semantic segmentation. In: ECCV, pp. 644–663 (2020)
Zhao, H., Jiang, L., Fu, C.W., et al.: Pointweb: enhancing local neighborhood features for point cloud processing. In: CVPR, pp. 5565–5573 (2019)
Zhao, H., Jiang, L., Jia, J., et al.: Point transformer. In: ICCV, pp. 16,259–16,268 (2021)
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: CVPR, pp. 4490–4499 (2018)
Acknowledgements
We thank the reviewers for the valuable comments. This work was supported in part by the National Natural Science Foundation of China (62102105, 62072126), Guangdong Basic and Applied Basic Research Foundation (2020A1515110997, 2022A1515011501, and 2022A1515010138), the Science and Technology Program of Guangzhou (202002030263, 202102010419 and 202201020229), and the Open Project Program of the State Key Lab of CAD and CG (A2218), Zhejiang University.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
We declare that we have no competing financial interests or personal relationships that could have appeared to influence our work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, K., Chen, Y., Peng, W. et al. RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception. Vis Comput 39, 5577–5588 (2023). https://doi.org/10.1007/s00371-022-02682-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02682-0