Log in

Patchlpr: a multi-level feature fusion transformer network for LiDAR-based place recognition

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

LiDAR-based place recognition plays a crucial role in autonomous vehicles, enabling the identification of locations in GPS-invalid environments that were previously accessed. Localization in place recognition can be achieved by searching for nearest neighbors in the database. Two common types of place recognition features are local descriptors and global descriptors. Local descriptors typically compactly represent regions or points, while global descriptors provide an overarching view of the data. Despite the significant progress made in recent years by both types of descriptors, any representation inevitably involves information loss. To overcome this limitation, we have developed PatchLPR, a Transformer network employing multi-level feature fusion for robust place recognition. PatchLPR integrates global and local feature information, focusing on meaningful regions on the feature map to generate an environmental representation. We propose a patch feature extraction module based on the Vision Transformer to fully leverage the information and correlations of different features. We evaluated our approach on the KITTI dataset and a self-collected dataset covering over 4.2 km. The experimental results demonstrate that our method effectively utilizes multi-level features to enhance place recognition performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability statement

The KITTI dataset used in this research is available online and the HUE dataset can be provided by us.

References

  1. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016). ar**v:1511.07247

  2. Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-netvlad: multi-scale fusion of locally-global descriptors for place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14141–14152 (2021). ar**v:2103.01486. Focus to learn more

  3. Lowry, S., Sünderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015). https://doi.org/10.1109/TRO.2015.2496823

    Article  Google Scholar 

  4. Schuster, R., Wasenmuller, O., Unger, C., Stricker, D.: Sdc-stacked dilated convolution: a unified descriptor network for dense matching tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2556–2565 (2019). ar**v:1904.03076

  5. Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp. 726–743 (2020). ar**v:2001.05027

  6. Wang, R., Shen, Y., Zuo, W., Zhou, S., Zheng, N.: Transvpr: Transformer-based place recognition with multi-level attention aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13648–13657 (2022). ar**v:2201.02001

  7. Yin, H., Xu, X., Lu, S., Chen, X., **ong, R., Shen, S., Stachniss, C., Wang, Y.: A survey on global lidar localization: challenges, advances and open problems. ar**v preprint ar**v:2302.07433 (2023)

  8. Chen, X., Läbe, T., Milioto, A., Röhling, T., Vysotska, O., Haag, A., Behley, J., Stachniss, C.: Overlapnet: loop closing for lidar-based slam. ar**v preprint ar**v:2105.11344 (2021)

  9. Ma, J., Zhang, J., Xu, J., Ai, R., Gu, W., Chen, X.: Overlaptransformer: an efficient and yaw-angle-invariant transformer network for lidar-based place recognition. IEEE Robot. Autom. Lett. 7(3), 6958–6965 (2022). https://doi.org/10.1109/LRA.2022.3178797

    Article  Google Scholar 

  10. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)

  11. Uy, M.A., Lee, G.H.: Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4470–4479 (2018). ar**v:1804.03492

  12. Kim, G., Kim, A.: Scan context: egocentric spatial descriptor for place recognition within 3d point cloud map. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4802–4809 (2018). https://doi.org/10.1109/IROS.2018.8593953

  13. Kong, X., Yang, X., Zhai, G., Zhao, X., Zeng, X., Wang, M., Liu, Y., Li, W., Wen, F.: Semantic graph based place recognition for 3d point clouds. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8216–8223 (2020). https://doi.org/10.1109/IROS45743.2020.9341060

  14. Vidanapathirana, K., Moghadam, P., Harwood, B., Zhao, M., Sridharan, S., Fookes, C.: Locus: lidar-based place recognition using spatiotemporal higher-order pooling. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5075–5081 (2021). https://doi.org/10.1109/ICRA48506.2021.9560915

  15. Vysotska, O., Stachniss, C.: Relocalization under substantial appearance changes using hashing. In: Proceedings of the IROS Workshop on Planning, Perception and Navigation for Intelligent Vehicles, Vancouver, BC, Canada, vol. 24 (2017)

  16. Li, J., Hu, Q., Ai, M.: Rift: multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Trans. Image Process. 29, 3296–3310 (2019). https://doi.org/10.1109/TIP.2019.2959244

    Article  Google Scholar 

  17. Luo, L., Cao, S.-Y., Sheng, Z., Shen, H.-L.: Lidar-based global localization using histogram of orientations of principal normals. IEEE Trans. Intell. Veh. 7(3), 771–782 (2022). https://doi.org/10.1109/TIV.2022.3169153

    Article  Google Scholar 

  18. Rizzini, D.L.: Place recognition of 3d landmarks based on geometric relations. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 648–654 (2017). https://doi.org/10.1109/IROS.2017.8202220

  19. Guo, J., Borges, P.V., Park, C., Gawel, A.: Local descriptor for robust place recognition using lidar intensity. IEEE Robot. Autom. Lett. 4(2), 1470–1477 (2019). https://doi.org/10.1109/LRA.2019.2893887

    Article  Google Scholar 

  20. **ang, H., Zhu, X., Shi, W., Fan, W., Chen, P., Bao, S.: Delightlcd: a deep and lightweight network for loop closure detection in lidar slam. IEEE Sens. J. 22(21), 20761–20772 (2022). https://doi.org/10.1109/JSEN.2022.3206506

    Article  Google Scholar 

  21. Zhou, Y., Wang, Y., Poiesi, F., Qin, Q., Wan, Y.: Loop closure detection using local 3d deep descriptors. IEEE Robot. Autom. Lett. 7(3), 6335–6342 (2022). ar**v:2111.00440

    Article  Google Scholar 

  22. Poiesi, F., Boscaini, D.: Distinctive 3d local deep descriptors. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5720–5727 (2021). https://doi.org/10.1109/ICPR48806.2021.9411978

  23. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017). ar**v:1612.00593

  24. Liu, Z., Zhou, S., Suo, C., Yin, P., Chen, W., Wang, H., Li, H., Liu, Y.-H.: Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2831–2840 (2019). ar**v:1812.07050

  25. Zhang, W., **ao, C.: Pcan: 3d attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12436–12445 (2019). ar**v:1904.09793

  26. Komorowski, J.: Minkloc3d: Point cloud based large-scale place recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1790–1799 (2021). ar**v:2011.04530

  27. Zhou, Z., Zhao, C., Adolfsson, D., Su, S., Gao, Y., Duckett, T., Sun, L.: Ndt-transformer: Large-scale 3d point cloud localisation using the normal distribution transform representation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5654–5660 (2021). https://doi.org/10.1109/ICRA48506.2021.9560932

  28. Ma, J., **ong, G., Xu, J., Chen, X.: Cvtnet: a cross-view transformer network for lidar-based place recognition in autonomous driving environments. IEEE Trans. Ind. Inf. (2023). https://doi.org/10.1109/TII.2023.3313635

    Article  Google Scholar 

  29. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. syst. (2017). ar**v:1706.03762

  30. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.6248074

  31. Pandey, G., McBride, J.R., Eustice, R.M.: Ford campus vision and lidar data set. Int. J. Robot. Res. 30(13), 1543–1552 (2011)

    Article  Google Scholar 

  32. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUS. IEEE Trans. Big Data 7(3), 535–547 (2019). https://doi.org/10.1109/TBDATA.2019.2921572

    Article  Google Scholar 

Download references

Funding

Research on Key Technologies of Intelligent Equipment for Mine Powered by Pure Clean Energy, Natural Science Foundation of Hebei Province, F2021402011

Author information

Authors and Affiliations

Authors

Contributions

All authors have contributed their unique insights to the research concept, and after review and discussion, they unanimously approved the content of the final manuscript.

Corresponding author

Correspondence to Jianhua Guo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Guo, J., Wang, H. et al. Patchlpr: a multi-level feature fusion transformer network for LiDAR-based place recognition. SIViP 18 (Suppl 1), 157–165 (2024). https://doi.org/10.1007/s11760-024-03138-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03138-9

Keywords

Navigation