Abstract
LiDAR-based place recognition plays a crucial role in autonomous vehicles, enabling the identification of locations in GPS-invalid environments that were previously accessed. Localization in place recognition can be achieved by searching for nearest neighbors in the database. Two common types of place recognition features are local descriptors and global descriptors. Local descriptors typically compactly represent regions or points, while global descriptors provide an overarching view of the data. Despite the significant progress made in recent years by both types of descriptors, any representation inevitably involves information loss. To overcome this limitation, we have developed PatchLPR, a Transformer network employing multi-level feature fusion for robust place recognition. PatchLPR integrates global and local feature information, focusing on meaningful regions on the feature map to generate an environmental representation. We propose a patch feature extraction module based on the Vision Transformer to fully leverage the information and correlations of different features. We evaluated our approach on the KITTI dataset and a self-collected dataset covering over 4.2 km. The experimental results demonstrate that our method effectively utilizes multi-level features to enhance place recognition performance.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03138-9/MediaObjects/11760_2024_3138_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03138-9/MediaObjects/11760_2024_3138_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03138-9/MediaObjects/11760_2024_3138_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03138-9/MediaObjects/11760_2024_3138_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03138-9/MediaObjects/11760_2024_3138_Fig5_HTML.png)
Similar content being viewed by others
Data availability statement
The KITTI dataset used in this research is available online and the HUE dataset can be provided by us.
References
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016). ar**v:1511.07247
Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-netvlad: multi-scale fusion of locally-global descriptors for place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14141–14152 (2021). ar**v:2103.01486. Focus to learn more
Lowry, S., Sünderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015). https://doi.org/10.1109/TRO.2015.2496823
Schuster, R., Wasenmuller, O., Unger, C., Stricker, D.: Sdc-stacked dilated convolution: a unified descriptor network for dense matching tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2556–2565 (2019). ar**v:1904.03076
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp. 726–743 (2020). ar**v:2001.05027
Wang, R., Shen, Y., Zuo, W., Zhou, S., Zheng, N.: Transvpr: Transformer-based place recognition with multi-level attention aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13648–13657 (2022). ar**v:2201.02001
Yin, H., Xu, X., Lu, S., Chen, X., **ong, R., Shen, S., Stachniss, C., Wang, Y.: A survey on global lidar localization: challenges, advances and open problems. ar**v preprint ar**v:2302.07433 (2023)
Chen, X., Läbe, T., Milioto, A., Röhling, T., Vysotska, O., Haag, A., Behley, J., Stachniss, C.: Overlapnet: loop closing for lidar-based slam. ar**v preprint ar**v:2105.11344 (2021)
Ma, J., Zhang, J., Xu, J., Ai, R., Gu, W., Chen, X.: Overlaptransformer: an efficient and yaw-angle-invariant transformer network for lidar-based place recognition. IEEE Robot. Autom. Lett. 7(3), 6958–6965 (2022). https://doi.org/10.1109/LRA.2022.3178797
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)
Uy, M.A., Lee, G.H.: Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4470–4479 (2018). ar**v:1804.03492
Kim, G., Kim, A.: Scan context: egocentric spatial descriptor for place recognition within 3d point cloud map. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4802–4809 (2018). https://doi.org/10.1109/IROS.2018.8593953
Kong, X., Yang, X., Zhai, G., Zhao, X., Zeng, X., Wang, M., Liu, Y., Li, W., Wen, F.: Semantic graph based place recognition for 3d point clouds. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8216–8223 (2020). https://doi.org/10.1109/IROS45743.2020.9341060
Vidanapathirana, K., Moghadam, P., Harwood, B., Zhao, M., Sridharan, S., Fookes, C.: Locus: lidar-based place recognition using spatiotemporal higher-order pooling. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5075–5081 (2021). https://doi.org/10.1109/ICRA48506.2021.9560915
Vysotska, O., Stachniss, C.: Relocalization under substantial appearance changes using hashing. In: Proceedings of the IROS Workshop on Planning, Perception and Navigation for Intelligent Vehicles, Vancouver, BC, Canada, vol. 24 (2017)
Li, J., Hu, Q., Ai, M.: Rift: multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Trans. Image Process. 29, 3296–3310 (2019). https://doi.org/10.1109/TIP.2019.2959244
Luo, L., Cao, S.-Y., Sheng, Z., Shen, H.-L.: Lidar-based global localization using histogram of orientations of principal normals. IEEE Trans. Intell. Veh. 7(3), 771–782 (2022). https://doi.org/10.1109/TIV.2022.3169153
Rizzini, D.L.: Place recognition of 3d landmarks based on geometric relations. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 648–654 (2017). https://doi.org/10.1109/IROS.2017.8202220
Guo, J., Borges, P.V., Park, C., Gawel, A.: Local descriptor for robust place recognition using lidar intensity. IEEE Robot. Autom. Lett. 4(2), 1470–1477 (2019). https://doi.org/10.1109/LRA.2019.2893887
**ang, H., Zhu, X., Shi, W., Fan, W., Chen, P., Bao, S.: Delightlcd: a deep and lightweight network for loop closure detection in lidar slam. IEEE Sens. J. 22(21), 20761–20772 (2022). https://doi.org/10.1109/JSEN.2022.3206506
Zhou, Y., Wang, Y., Poiesi, F., Qin, Q., Wan, Y.: Loop closure detection using local 3d deep descriptors. IEEE Robot. Autom. Lett. 7(3), 6335–6342 (2022). ar**v:2111.00440
Poiesi, F., Boscaini, D.: Distinctive 3d local deep descriptors. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5720–5727 (2021). https://doi.org/10.1109/ICPR48806.2021.9411978
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017). ar**v:1612.00593
Liu, Z., Zhou, S., Suo, C., Yin, P., Chen, W., Wang, H., Li, H., Liu, Y.-H.: Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2831–2840 (2019). ar**v:1812.07050
Zhang, W., **ao, C.: Pcan: 3d attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12436–12445 (2019). ar**v:1904.09793
Komorowski, J.: Minkloc3d: Point cloud based large-scale place recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1790–1799 (2021). ar**v:2011.04530
Zhou, Z., Zhao, C., Adolfsson, D., Su, S., Gao, Y., Duckett, T., Sun, L.: Ndt-transformer: Large-scale 3d point cloud localisation using the normal distribution transform representation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5654–5660 (2021). https://doi.org/10.1109/ICRA48506.2021.9560932
Ma, J., **ong, G., Xu, J., Chen, X.: Cvtnet: a cross-view transformer network for lidar-based place recognition in autonomous driving environments. IEEE Trans. Ind. Inf. (2023). https://doi.org/10.1109/TII.2023.3313635
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. syst. (2017). ar**v:1706.03762
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.6248074
Pandey, G., McBride, J.R., Eustice, R.M.: Ford campus vision and lidar data set. Int. J. Robot. Res. 30(13), 1543–1552 (2011)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUS. IEEE Trans. Big Data 7(3), 535–547 (2019). https://doi.org/10.1109/TBDATA.2019.2921572
Funding
Research on Key Technologies of Intelligent Equipment for Mine Powered by Pure Clean Energy, Natural Science Foundation of Hebei Province, F2021402011
Author information
Authors and Affiliations
Contributions
All authors have contributed their unique insights to the research concept, and after review and discussion, they unanimously approved the content of the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Y., Guo, J., Wang, H. et al. Patchlpr: a multi-level feature fusion transformer network for LiDAR-based place recognition. SIViP 18 (Suppl 1), 157–165 (2024). https://doi.org/10.1007/s11760-024-03138-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03138-9