Visual Cross-View Metric Localization with Dense Uncertainty Estimates

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

This work addresses visual cross-view metric localization for outdoor robotics. Given a ground-level color image and a satellite patch that contains the local surroundings, the task is to identify the location of the ground camera within the satellite patch. Related work addressed this task for range-sensors (LiDAR, Radar), but for vision, only as a secondary regression step after an initial cross-view image retrieval step. Since the local satellite patch could also be retrieved through any rough localization prior (e.g. from GPS/GNSS, temporal filtering), we drop the image retrieval objective and focus on the metric localization only. We devise a novel network architecture with denser satellite descriptors, similarity matching at the bottleneck (rather than at the output as in image retrieval), and a dense spatial distribution as output to capture multi-modal localization ambiguities. We compare against a state-of-the-art regression baseline that uses global image descriptors. Quantitative and qualitative experimental results on the recently proposed VIGOR and the Oxford RobotCar datasets validate our design. The produced probabilities are correlated with localization accuracy, and can even be used to roughly estimate the ground camera’s heading when its orientation is unknown. Overall, our method reduces the median metric localization error by 51%, 37%, and 28% compared to the state-of-the-art when generalizing respectively in the same area, across areas, and across time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Brazil)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (Brazil)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (Brazil)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Models and code, plus extended data are available at

    https://github.com/tudelft-iv/CrossViewMetricLocalization.

References

  1. Agarwal, P., Burgard, W., Spinello, L.: Metric localization using google street view. In: IEEE/RSJ IROS, pp. 3111–3118 (2015)

    Google Scholar 

  2. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of IEEE/CVF CVPR, pp. 5297–5307 (2016)

    Google Scholar 

  3. Barsan, I.A., Wang, S., Pokrovsky, A., Urtasun, R.: Learning to localize using a lidar intensity map. In: CoRL (10 2018)

    Google Scholar 

  4. Ben-Moshe, B., Elkin, E., et al.: Improving accuracy of gnss devices in urban canyons. In: CCCG, pp. 511–515 (2011)

    Google Scholar 

  5. Chen, D.M., et al.: City-scale landmark identification on mobile devices. In: Proceedings of IEEE/CVF CVPR, pp. 737–744 (2011)

    Google Scholar 

  6. Clement, L., Gridseth, M., Tomasi, J., Kelly, J.: Learning matchable image transformations for long-term metric visual localization. IEEE Robot. Autom. Lett. 5(2), 1492–1499 (2020). https://doi.org/10.1109/LRA.2020.2967659

    Article  Google Scholar 

  7. Deng, J., Dong, W., Socher, R., et al.: Imagenet: A large-scale hierarchical image database. In: Proceedings of IEEE/CVF CVPR, pp. 248–255 (2009)

    Google Scholar 

  8. Hu, S., Feng, M., Nguyen, R.M., Hee Lee, G.: CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In: Proceedings of IEEE/CVF CVPR, pp. 7258–7267 (2018)

    Google Scholar 

  9. Hu, S., Lee, G.H.: Image-based geo-localization using satellite imagery. IJCV, pp. 1–15 (2019)

    Google Scholar 

  10. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of IEEE/CVF CVPR, pp. 1125–1134 (2017)

    Google Scholar 

  11. Khosla, P., et al.: Supervised contrastive learning. ar**v preprint ar**v:2004.11362 (2020)

  12. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. ICLR (2014)

    Google Scholar 

  13. Lategahn, H., Stiller, C.: Vision-only localization. IEEE Trans. Intell. Transport. Syst. 15(3), 1246–1257 (2014). https://doi.org/10.1109/TITS.2014.2298492

    Article  Google Scholar 

  14. Lin, T.Y., Belongie, S., Hays, J.: Cross-view image geolocalization. In: Proceedings of IEEE/CVF CVPR, pp. 891–898 (2013)

    Google Scholar 

  15. Lin, T.Y., Cui, Y., Belongie, S., Hays, J.: Learning deep representations for ground-to-aerial geolocalization. In: Proceedings of IEEE/CVF CVPR, pp. 5007–5015 (2015)

    Google Scholar 

  16. Liu, L., Li, H.: Lending orientation to neural networks for cross-view geo-localization. In: Proceedings of IEEE/CVF CVPR, pp. 5624–5633 (2019)

    Google Scholar 

  17. Lowry, S., et al.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)

    Article  Google Scholar 

  18. Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: The oxford robotcar dataset. IJRR 36(1), 3–15 (2017)

    Google Scholar 

  19. Maddern, W., Pascoe, G., et al.: Real-time kinematic ground truth for the oxford robotcar dataset. ar**v preprint: 2002.10152 (2020)

    Google Scholar 

  20. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. ar**v preprint ar**v:1807.03748 (2018)

  21. Regmi, K., Shah, M.: Bridging the domain gap for ground-to-aerial image matching. In: Proc. of IEEE/CVF ICCV, pp. 470–479 (2019)

    Google Scholar 

  22. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  23. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of IEEE/CVF CVPR, pp. 815–823 (2015)

    Google Scholar 

  24. Shi, Y., Li, H.: Beyond cross-view image retrieval: Highly accurate vehicle localization using satellite image. In: Proceedings of the IEEE/CVF CVPR (2022)

    Google Scholar 

  25. Shi, Y., Liu, L., Yu, X., Li, H.: Spatial-aware feature aggregation for image based cross-view geo-localization. In: NeurIPS, pp. 10090–10100 (2019)

    Google Scholar 

  26. Shi, Y., Yu, X., Campbell, D., Li, H.: Where am i looking at? joint location and orientation estimation by cross-view matching. In: Proceedings of IEEE/CVF CVPR, pp. 4064–4072 (2020)

    Google Scholar 

  27. Shi, Y., Yu, X., Liu, L., et al.: Optimal feature transport for cross-view image geo-localization. In: Proceedings of AAAI, pp. 11990–11997 (2020)

    Google Scholar 

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  29. Tang, T.Y., De Martini, D., Newman, P.: Get to the point: Learning lidar place recognition and metric localisation using overhead imagery. Robotics: Science and Systems (2021)

    Google Scholar 

  30. Tang, T.Y., De Martini, D., Wu, S., Newman, P.: Self-supervised learning for using overhead imagery as maps in outdoor range sensor localization. IJRR 40(12–14), 1488–1509 (2021)

    Google Scholar 

  31. Tang, T.Y., De Martini, D., Barnes, D., Newman, P.: Rsl-net: Localising in satellite images from a radar on the ground. IEEE Robot. Autom. Lett. 5(2), 1087–1094 (2020)

    Article  Google Scholar 

  32. Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. MIT press (2005)

    Google Scholar 

  33. Tian, Y., Chen, C., Shah, M.: Cross-view image matching for geo-localization in urban environments. In: Proceeidngs of IEEE/CVF CVPR, pp. 3608–3616 (2017)

    Google Scholar 

  34. Toker, A., Zhou, Q., Maximov, M., Leal-Taixe, L.: Coming down to earth: Satellite-to-street view synthesis for geo-localization. In: Proc. of IEEE/CVF CVPR. pp. 6488–6497 (June 2021)

    Google Scholar 

  35. Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: Proceedings of IEEE/CVF CVPR, pp. 1808–1817 (2015)

    Google Scholar 

  36. Torii, A., Sivic, J., Okutomi, M., Pajdla, T.: Visual place recognition with repetitive structures. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2346–2359 (2015). https://doi.org/10.1109/TPAMI.2015.2409868

    Article  Google Scholar 

  37. Vo, Nam N.., Hays, James: Localizing and orienting street views using overhead imagery. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9905, pp. 494–509. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_30

    Chapter  Google Scholar 

  38. Wei, X., Bârsan, I.A., Wang, S., Martinez, J., Urtasun, R.: Learning to localize through compressed binary maps. In: Proceedings of IEEE/CVF CVPR, pp. 10316–10324 (2019)

    Google Scholar 

  39. Won, D., et al.: Performance improvement of inertial navigation system by using magnetometer with vehicle dynamic constraints. J. Sensors, 1–11 (2015)

    Google Scholar 

  40. Workman, S., Souvenir, R., Jacobs, N.: Wide-area image geolocalization with aerial reference imagery. In: Proceedings of IEEE/CVF ICCV, pp. 3961–3969 (2015)

    Google Scholar 

  41. **a, Z., Booij, O., Manfredi, M., Kooij, J.F.P.: Cross-view matching for vehicle localization by learning geographically local representations. IEEE Robot. Autom. Lett. 6(3), 5921–5928 (2021). https://doi.org/10.1109/LRA.2021.3088076

    Article  Google Scholar 

  42. **a, Z., Booij, O., Manfredi, M., Kooij, J.F.P.: Geographically local representation learning with a spatial prior for visual localization. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 557–573. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_38

    Chapter  Google Scholar 

  43. Yang, H., Lu, X., Zhu, Y.: Cross-view geo-localization with layer-to-layer transformer. In: NeurIPS. pp. 29009–29020 (2021)

    Google Scholar 

  44. Yin, H., Chen, R., Wang, Y., **ong, R.: Rall: end-to-end radar localization on lidar map using differentiable measurement model. IEEE Transactions on Intelligent Transportation Systems (2021)

    Google Scholar 

  45. Zhai, M., Bessinger, Z., Workman, S., Jacobs, N.: Predicting ground-level scene layout from aerial imagery. In: Proceedings of IEEE/CVF CVPR, pp. 867–875 (2017)

    Google Scholar 

  46. Zhu, S., Shah, M., Chen, C.: Transgeo: Transformer is all you need for cross-view image geo-localization. In: Proceedings of the IEEE/CVF CVPR, pp. 1162–1171 (2022)

    Google Scholar 

  47. Zhu, S., Yang, T., Chen, C.: Revisiting street-to-aerial view image geo-localization and orientation estimation. In: Proceedings of IEEE/CVF WACV, pp. 756–765 (2021)

    Google Scholar 

  48. Zhu, S., Yang, T., Chen, C.: Vigor: Cross-view image geo-localization beyond one-to-one retrieval. In: Proceedings of IEEE/CVF CVPR, pp. 3640–3649 (2021)

    Google Scholar 

Download references

Acknowledgements

This work is part of the research programme Efficient Deep Learning (EDL) with project number P16-25, which is (partly) financed by the Dutch Research Council (NWO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zimin **a .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10263 KB)

Supplementary material 2 (mp4 9523 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

**a, Z., Booij, O., Manfredi, M., Kooij, J.F.P. (2022). Visual Cross-View Metric Localization with Dense Uncertainty Estimates. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19842-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19841-0

  • Online ISBN: 978-3-031-19842-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation