Abstract
In recent years, point cloud registration has achieved great success by learning geometric features with deep learning techniques. However, existing approaches that rely on pure geometric context still suffer from sensor noise and geometric ambiguities (e.g., flat or symmetric structure), which limit their robustness to real-world scenes. When 3D point clouds are constructed by RGB-D cameras, we can enhance the learned features with complementary texture information from RGB images. To this end, we propose to learn a 3D hybrid feature that fully exploits the multi-view colored images and point clouds from indoor RGB-D scene scans. Specifically, to address the discrepancy of 2D–3D observations, we design to extract informative 2D features from image planes and take only these features for fusion. Then, we utilize a novel soft-fusion module to associate and fuse hybrid features in a unified space while alleviating the ambiguities of 2D–3D feature binding. Finally, we develop a self-supervised feature scoring module customized for our multi-modal hybrid features, which significantly improves the keypoint selection quality in noisy indoor scene scans. Our method shows competitive registration performance with previous methods on two real-world datasets.
Similar content being viewed by others
References
Chen K, Lai Y K, Hu S M. 3D indoor scene modeling from RGB-D data: a survey. Comp Visual Media, 2015, 1: 267–278
Chen Y D, Hao C Y, Wu W, et al. Robust dense reconstruction by range merging based on confidence estimation. Sci China Inf Sci, 2016, 59: 092103
Chen W, Duan J, Basevi H, et al. PointPoseNet: point pose network for robust 6D object pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020. 2824–2833
Du J, Wang R, Cremers D. DH3D: deep hierarchical 3D descriptors for robust large-scale 6DoF relocalization. In: Proceedings of European Conference on Computer Vision, 2020. 744–762
Schneider T, Dymczyk M, Fehr M, et al. Maplab: an open framework for research in visual-inertial map** and localization. IEEE Robot Autom Lett, 2018, 3: 1418–1425
Choy C, Dong W, Koltun V. Deep global registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 2514–2523
Lu S, Han J G, Wang L Z, et al. Research on two-stage variable scale three-dimensional point cloud registration algorithm. Laser Optoelectron Prog, 2020, 57: 201503
Zhang Z, Dai Y, Sun J. Deep learning based point cloud registration: an overview. Virtual Reality Intell Hardware, 2020, 2: 222–246
Guo Y, Bennamoun M, Sohel F A, et al. A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis, 2016, 116: 66–89
Stancelova P, Sikudova E, Cernekova Z. Performance evaluation of selected 3D keypoint detector-descriptor combinations. In: Proceedings of International Conference on Computer Vision and Graphics, 2020. 188–200
Bai X, Luo Z, Zhou L, et al. D3Feat: joint learning of dense detection and description of 3D local features. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020. 6358–6366
Zhang Z, Sun J, Dai Y, et al. VRNet: learning the rectified virtual corresponding points for 3D point cloud registration. IEEE Trans Circuits Syst Video Technol, 2022, 32: 4997–5010
Liu B S, Chen X M, Han Y H, et al. Accelerating DNN-based 3D point cloud processing for mobile computing. Sci China Inf Sci, 2019, 62: 212102
Park J, Zhou Q, Koltun V. Colored point cloud registration revisited. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Venice, 2017. 143–152
Dusmanu M, Rocco I, Pajdla T, et al. D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 8092–8101
Revaud J, de Souza C R, Humenberger M, et al. R2D2: reliable and repeatable detector and descriptor. In: Proceedings of Neural Information Processing Systems (NeurIPS), Vancouver, 2019. 12405–12415
DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: self-supervised interest point detection and description. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, 2018. 224–236
Tang J, Kim H, Guizilini V, et al. Neural outlier rejection for self-supervised keypoint learning. In: Proceedings of the 8th International Conference on Learning Representations (ICLR), 2020
Wang C, Xu D, Zhu Y, et al. Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. 3343–3352
Xu D, Anguelov D, Jain A. PointFusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 244–253
Chen Y, Yang B, Liang M, et al. Learning joint 2D-3D representations for depth completion. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 10022–10031
He Y, Sun W, Huang H, et al. PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. 11629–11638
Qi C, Chen X, Litany O, et al. ImVoteNet: boosting 3D object detection in point clouds with image votes. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. 4403–4412
Huang S, Gojcic Z, Usvyatsov M, et al. PREDATOR: registration of 3D point clouds with low overlap. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 4267–4276
Yu H, Li F, Saleh M, et al. CoFiNet: reliable coarse-to-fine correspondences for robust pointcloud registration. In: Proceedings of Advances in Neural Information Processing Systems, 2021. 34
Li J, Lee G H. USIP: unsupervised stable interest point detection from 3D point clouds. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 361–370
Tombari F, Salti S, Stefano L D. Unique signatures of histograms for local surface description. In: Proceedings of European Conference on Computer Vision, 2010. 356–369
Rusu R B, Blodow N, Beetz M. Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of IEEE International Conference on Robotics and Automation, Kobe, 2009. 3212–3217
Rusu R B, Bradski G, Thibaux R, et al. Fast 3D recognition and pose using the viewpoint feature histogram. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, 2010. 2155–2162
Aldoma A, Tombari F, Rusu R B, et al. OUR-CVFH-oriented, unique and repeatable clustered viewpoint feature histogram for object recognition and 6DOF pose estimation. In: Proceedings of Joint DAGM (German Association for Pattern Recognition) and OAGM Symposium, 2012. 113–122
Steder B, Rusu R B, Konolige K, et al. NARF: 3D range image features for object recognition. In: Proceedings of Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics at the IEEE/RSJ, 2010
Zeng A, Song S, Nießner M, et al. 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 199–208
Khoury M, Zhou Q, Koltun V. Learning compact geometric features. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017. 153–161
Deng H, Birdal T, Ilic S. PPFNet: global context aware local features for robust 3D point matching. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 195–205
Deng H, Birdal T, Ilic S. PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of European Conference on Computer Vision (ECCV), 2018. 602–618
Deng H, Birdal T, Ilic S. 3D local features for direct pairwise registration. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. 3244–3253
Wang Y, Solomon J. Deep closest point: learning representations for point cloud registration. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 3523–3532
Aoki Y, Goforth H, Srivatsan R A, et al. PointNetLK: robust & efficient point cloud registration using pointnet. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 7163–7172
Qi C, Su H, Mo K, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 77–85
Qi C, Yi L, Su H, et al. PointNet+ +: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 2017. 5099–5108
Thomas H, Qi C, Deschaud J-E, et al. KPConv: flexible and deformable convolution for point clouds. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 6410–6419
Wang Y, Sun Y, Liu Z, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 2019, 38: 1–12
Gojcic Z, Zhou C, Wegner J D, et al. The perfect match: 3D point cloud matching with smoothed densities. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. 5545–5554
Choy C, Park J, Koltun V. Fully convolutional geometric features. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 8957–8965
Liu B, Wang M, Foroosh H, et al. Sparse convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015. 806–814
Ao S, Hu Q, Yang B, et al. SpinNet: learning a general surface descriptor for 3D point cloud registration. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 11753–11762
Spezialetti R, Salti S, Stefano L D. Learning an effective equivariant 3D descriptor without supervision. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 6401–6410
Chen Z, Yang F, Tao W. DetarNet: decoupling translation and rotation by siamese network for point cloud registration. 2021. Ar**v:2112.14059
Bai X, Luo Z, Zhou L, et al. PointDSC: robust point cloud registration using deep spatial consistency. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 15859–15869
Sarlin P-E, DeTone D, Malisiewicz T, et al. SuperGlue: learning feature matching with graph neural networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4938–4947
Zhou L, Zhu S, Luo Z, et al. Learning and matching multi-view descriptors for registration of point clouds. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 527–544
Li L, Zhu S, Fu H, et al. End-to-end learning local multi-view descriptors for 3D point clouds. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. 1916–1925
Huang H, Kalogerakis E, Chaudhuri S, et al. Learning local shape descriptors from part correspondences with multiview convolutional networks. ACM Trans Graph, 2018, 37: 1–14
Banani M E, Gao L, Johnson J. UnsupervisedR&R: unsupervised point cloud registration via differentiable rendering. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 7129–7139
Banani M E, Johnson J. Bootstrap your own correspondences. In: Proceedings of Computer Vision and Pattern Recognition, 2021. 6433–6442
Sipiran I, Bustos B. Harris 3D: a robust extension of the Harris operator for interest point detection on 3D meshes. Vis Comput, 2011, 27: 963–976
Zhong Y. Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: Proceedings of IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, 2009. 689–696
Yew Z J, Lee G H. 3DFeat-Net: weakly supervised local 3D features for point cloud registration. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 630–646
Liang M, Yang B, Wang S, et al. Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 663–678
Yoo J H, Kim Y, Kim J, et al. 3D-CVF: generating joint camera and LIDAR features using cross-view spatial feature fusion for 3D object detection. In: Proceedings of European Conference on Computer Vision, 2020. 720–736
Dai A, Nießner M. 3DMV: joint 3D-multi-view prediction for 3D semantic scene segmentation. In: Proceedings of European Conference on Computer Vision, 2018. 452–468
Jaritz M, Gu J, Su H. Multi-view pointnet for 3D scene understanding. In: Proceedings of IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019
Zhang J, Zhu C, Zheng L C, et al. Fusion-aware point convolution for online semantic 3D scene segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4534–4543
Huang S S, Ma Z Y, Mu T J, et al. Supervoxel convolution for online 3D semantic segmentation. ACM Trans Graph, 2021, 40: 1–15
**ang R, Zheng F, Su H, et al. 3dDepthNet: point cloud guided depth completion network for sparse depth and single color image. 2020. Ar**v:2003.09175
**ng X, Cai Y, Lu T, et al. 3DTNet: learning local features using 2D and 3D cues. In: Proceedings of International Conference on 3D Vision (3DV), Verona, 2018. 435–443
Wang B, Chen C, Cui Z, et al. P2-Net: joint description and detection of local features for pixel and point matching. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 16004–16013
Pham Q-H, Uy M A, Hua B-S, et al. LCD: learned cross-domain descriptors for 2D-3D matching. In: Proceedings of Computer Vision and Pattern Recognition, New York, 2020. 11856–11864
Feng M, Hu S, Ang M H, et al. 2D3D-MatchNet: learning to match keypoints across 2D image and 3D point cloud. In: Proceedings of International Conference on Robotics and Automation (ICRA), 2019. 4790–4796
Christiansen P H, Kragh M F, Brodskiy Y, et al. UnsuperPoint: end-to-end unsupervised interest point detector and descriptor. 2019. Ar**v:1907.04011
Lindenberger P, Sarlin P-E, Larsson V, et al. Pixel-perfect structure-from-motion with featuremetric refinement. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 5987–5997
He K, Lu Y, Sclaroff S. Local descriptors optimized for average precision. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 596–605
Wang C, Pelillo M, Siddiqi K. Dominant set clustering and pooling for multi-view 3D object recognition. In: Proceedings of British Machine Vision Conference, London, 2017
Mishchuk A, Mishkin D, Radenovic F, et al. Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings of Computer Vision and Pattern Recognition, Long Beach, 2017. 4826–4837
Law M T, Thome N, Cord M. Quadruplet-wise image similarity learning. In: Proceedings of IEEE International Conference on Computer Vision, Sydney, 2013. 249–256
Brachmann E, Rother C. Learning less is more-6D camera localization via 3D surface regression. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 4654–4662
Choi S, Zhou Q, Koltun V. Robust reconstruction of indoor scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015. 5556–5565
Acknowledgements
This work was partially supported by National Natural Science Foundation of China (Grant No. 61932003) and ZJU-SenseTime Joint Lab of 3D Vision.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information
Appendixes A and B. The supporting information is available online at info.scichina.com and springer.longhoe.net. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Rights and permissions
About this article
Cite this article
Yang, B., Huang, Z., Li, Y. et al. Hybrid3D: learning 3D hybrid features with point clouds and multi-view images for point cloud registration. Sci. China Inf. Sci. 66, 172101 (2023). https://doi.org/10.1007/s11432-022-3604-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3604-6