RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception

Tang, Keke; Chen, Yuhong; Peng, Weilong; Zhang, Yanling; Fang, Meie; Wang, Zheng; Song, Peng

doi:10.1007/s00371-022-02682-0

RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception

Original article
Published: 19 October 2022

Volume 39, pages 5577–5588, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Keke Tang ORCID: orcid.org/0000-0003-0377-1022¹^na1,
Yuhong Chen¹^na1,
Weilong Peng¹,
Yanling Zhang¹,
Meie Fang¹,
Zheng Wang² &
…
Peng Song³

556 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Designing efficient deep learning models for 3D point clouds is an important research topic. Point-voxel convolution (Liu et al. in NeurIPS, 2019) is a pioneering approach in this direction, but it still has considerable room for improvement in terms of performance, since it has quite a few layers of simple 3D convolutions and linear point-voxel feature fusion operations. To resolve these issues, we propose a novel reparameterizable point-voxel convolution (RepPVConv) block. First, RepPVConv adopts two reparameterizable 3D convolution modules to extract more informative voxel features without introducing any extra computational overhead for inference. The rationale is that the reparameterizable 3D convolution modules are trained in high-capacity modes but are reparameterized into low-capacity modes during inference while losslessly maintaining the original performance. Second, RepPVConv attentively fuses the reparameterized voxel features with those of points. Since the proposed approach operates in a nonlinear manner, descriptive reparameterized voxel features can be better utilized. Extensive experimental results show that RepPVConv-based networks are efficient in terms of both GPU memory consumption and computational complexity and significantly outperform the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Noise4Denoise: Leveraging noise for unsupervised point cloud denoising

Article Open access 14 June 2024

Cell Detection with Star-Convex Polygons

References

Armeni, I., Sener, O., Zamir, A.R. et al.: 3d semantic parsing of large-scale indoor spaces. In: CVPR, pp. 1534–1543 (2016)
Armeni, I., Sax, S., Zamir, A.R., et al.: Joint 2d-3d-semantic data for indoor scene understanding. ar**v preprint ar**v:1702.01105 (2017)
Bronstein, M.M., Bruna, J., LeCun, Y., et al.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)
Article Google Scholar
Chang, A.X., Funkhouser, T., Guibas, L., et al.: Shapenet: an information-rich 3d model repository. ar**v preprint ar**v:1512.03012 (2015)
Chen, L., Zhang, Q.: Ddgcn: graph convolution network based on direction and distance for point cloud learning. Vis. Comput. 1–11 (2022) https://doi.org/10.1007/s00371-021-02351-8
Chen, Y., Peng, W., Tang, K., et al.: Pyrapvconv: efficient 3d point cloud perception with pyramid voxel convolution and sharable attention. Comput. Intell. Neurosci. 2022:1–9 https://doi.org/10.1155/2022/2286818
Choy, C., Gwak, J., Savarese, S.: 4d spatio-temporal convnets: Minkowski convolutional neural networks. In: CVPR, pp. 3075–3084 (2019)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., et al.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: MICCAI, pp. 424–432 (2016)
Ding, X., Zhang, X., Ma, N., et al.: Repvgg: making vgg-style convnets great again. In: CVPR (2021)
Engelcke, M., Rao, D., Wang, D.Z., et al.: Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: ICRA, pp. 1355–1361. IEEE (2017)
Geiger, A., Lenz, P., Stiller, C., et al.: Vision meets robotics: the kitti dataset. IJRR 32(11), 1231–1237 (2013)
Google Scholar
Graham, B., Engelcke, M., Van Der Maaten, L.: 3d semantic segmentation with submanifold sparse convolutional networks. In: CVPR, pp. 9224–9232 (2018)
Guo, Y., Bennamoun, M., Sohel, F., et al.: 3d object recognition in cluttered scenes with local surface features: a survey. IEEE TPAMI 36(11), 2270–2287 (2014)
Article Google Scholar
Guo, Y., Wang, H., Hu, Q., et al.: Deep learning for 3d point clouds: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)
Article Google Scholar
Hu, Q., Yang, B., **e, L., et al.: Randla-net: efficient semantic segmentation of large-scale point clouds. In: CVPR, pp. 11,108–11,117 (2020)
Ioannidou, A., Chatzilari, E., Nikolopoulos, S., et al.: Deep learning advances in computer vision with 3d data: a survey. ACM Comput. Surv. (CSUR) 50(2), 1–38 (2017)
Article Google Scholar
Kingma, D.P., Welling, M., et al.: An introduction to variational autoencoders. Found. Trends® Mach. Learn. 12(4), 307–392 (2019)
Article MATH Google Scholar
Li, B.: 3d fully convolutional network for vehicle detection in point cloud. In: IROS, pp. 1513–1518 (2017)
Li, Y., Bu, R., Sun, M., et al.: Pointcnn: convolution on x-transformed points. In: NeurIPS, pp. 820–830 (2018)
Lin, N., Li, Y., Tang, K., et al.: Manipulation planning from demonstration via goal-conditioned prior action primitive decomposition and alignment. IEEE Robot. Autom. Lett. 7(2), 1387–1394 (2022)
Article Google Scholar
Liu, Z., Tang, H., Lin, Y., et al.: Point-voxel CNN for efficient 3d deep learning. In: NeurIPS (2019)
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: IROS, pp. 922–928 (2015)
Noh, J., Lee, S., Ham, B.: Hvpr: hybrid voxel-point representation for single-stage 3d object detection. In: CVPR, pp. 14,605–14,614 (2021)
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8026–8037 (2019)
Qi, C.R., Su, H., Mo, K., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: CVPR, pp. 652–660 (2017a)
Qi, C.R., Yi, L., Su, H., et al.: Pointnet++ deep hierarchical feature learning on point sets in a metric space. In: NeurIPS, pp. 5105–5114 (2017b)
Qi, C.R., Liu, W., Wu, C., et al. Frustum pointnets for 3d object detection from rgb-d data. In: CVPR, pp. 918–927 (2018)
Que, Z., Lu, G., Xu, D.: Voxelcontext-net: an octree based framework for point cloud compression. In: CVPR, pp. 6042–6051 (2021)
Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: learning deep 3d representations at high resolutions. In: CVPR, pp. 3577–3586 (2017)
Shi, S., Guo, C., Jiang, L., et al.: PV-RCNN: point-voxel feature set abstraction for 3d object detection. In: CVPR, pp. 10,529–10,538 (2020)
Shi, S., Jiang, L., Deng, J., et al.: PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3d object detection. ar**v preprint ar**v:2102.00463 (2021)
Shi, W., Rajkumar, R.: Point-GNN: graph neural network for 3d object detection in a point cloud. In: CVPR, pp. 1711–1719 (2020)
Song, S., **ao, J.: Sliding shapes for 3d object detection in depth images. In: ECCV, pp. 634–651 (2014)
Song, S., **ao, J.: Deep sliding shapes for amodal 3d object detection in rgb-d images. In: CVPR, pp. 808–816 (2016)
Su, H., Maji, S., Kalogerakis, E., et al.: Multi-view convolutional neural networks for 3d shape recognition. In: ICCV, pp. 945–953 (2015)
Sun, Y., Miao, Y., Chen, J., et al.: Pgcnet: patch graph convolutional network for point cloud segmentation of indoor scenes. Vis. Comput. 36(10), 2407–2418 (2020)
Article Google Scholar
Tang, H., Liu, Z., Zhao, S., et al.: Searching efficient 3d architectures with sparse point-voxel convolution. In: ECCV, pp. 685–702 (2020)
Tang, K., Ma, Y., Miao, D., et al.: Decision fusion networks for image classification. IEEE Trans. Neural Netw. Learn. Syst. (2022) https://doi.org/10.1109/TNNLS.2022.3196129
Thomas, H., Qi, C.R., Deschaud, J.E., et al.: Kpconv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6411–6420 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NeurIPS (2017)
Veit, A., Wilber, M., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: NeurIPS, pp. 550–558 (2016)
Wang, P.S., Liu, Y., Guo, Y.X., et al.: O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM TOG 36(4), 1–11 (2017)
Google Scholar
Wang, P.S., Sun, C.Y., Liu, Y., et al.: Adaptive o-cnn: a patch-based deep representation of 3d shapes. ACM TOG 37(6), 1–11 (2018)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., et al.: Dynamic graph cnn for learning on point clouds. ACM TOG (SIGGRAPH) 38(5), 1–12 (2019)
Article Google Scholar
Wei, Y., Wang, Z., Rao, Y., et al.: Pv-raft: point-voxel correlation fields for scene flow estimation of point clouds. In: CVPR, pp. 6954–6963 (2021)
Wu, W., Qi, Z., Fuxin, L.: Pointconv: deep convolutional networks on 3d point clouds. In: CVPR, pp. 9621–9630 (2019)
Wu, Z., Song, S., Khosla, A., et al.: 3d shapenets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)
Xu, M., Ding, R., Zhao, H., et al.: Paconv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR (2021)
Zagoruyko, S., Komodakis, N.: Diracnets: Training very deep neural networks without skip-connections. ar**v preprint ar**v:1706.00388 (2017)
Zhang, F., Fang, J., Wah, B.W., et al.: Deep fusionnet for point cloud semantic segmentation. In: ECCV, pp. 644–663 (2020)
Zhao, H., Jiang, L., Fu, C.W., et al.: Pointweb: enhancing local neighborhood features for point cloud processing. In: CVPR, pp. 5565–5573 (2019)
Zhao, H., Jiang, L., Jia, J., et al.: Point transformer. In: ICCV, pp. 16,259–16,268 (2021)
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: CVPR, pp. 4490–4499 (2018)

Download references

Acknowledgements

We thank the reviewers for the valuable comments. This work was supported in part by the National Natural Science Foundation of China (62102105, 62072126), Guangdong Basic and Applied Basic Research Foundation (2020A1515110997, 2022A1515011501, and 2022A1515010138), the Science and Technology Program of Guangzhou (202002030263, 202102010419 and 202201020229), and the Open Project Program of the State Key Lab of CAD and CG (A2218), Zhejiang University.

Author information

Keke Tang and Yuhong Chen contributed equally to this work.

Authors and Affiliations

Guangzhou University, Guangzhou, China
Keke Tang, Yuhong Chen, Weilong Peng, Yanling Zhang & Meie Fang
Southern University of Science and Technology, Shenzhen, China
Zheng Wang
Singapore University of Technology and Design, Singapore, Singapore
Peng Song

Authors

Keke Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weilong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yanling Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Meie Fang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weilong Peng or Meie Fang.

Ethics declarations

Conflict of interest

We declare that we have no competing financial interests or personal relationships that could have appeared to influence our work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, K., Chen, Y., Peng, W. et al. RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception. Vis Comput 39, 5577–5588 (2023). https://doi.org/10.1007/s00371-022-02682-0

Download citation

Accepted: 18 September 2022
Published: 19 October 2022
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00371-022-02682-0

Keywords

Access this article

Log in via an institution

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception

Abstract

Access this article

Similar content being viewed by others

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Noise4Denoise: Leveraging noise for unsupervised point cloud denoising

Cell Detection with Star-Convex Polygons

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception

Abstract

Access this article

Similar content being viewed by others

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Noise4Denoise: Leveraging noise for unsupervised point cloud denoising

Cell Detection with Star-Convex Polygons

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation