Visual Cross-View Metric Localization with Dense Uncertainty Estimates

**a, Zimin; Booij, Olaf; Manfredi, Marco; Kooij, Julian F. P.

doi:10.1007/978-3-031-19842-7_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13699))

Included in the following conference series:

European Conference on Computer Vision

2975 Accesses
7 Citations

Abstract

This work addresses visual cross-view metric localization for outdoor robotics. Given a ground-level color image and a satellite patch that contains the local surroundings, the task is to identify the location of the ground camera within the satellite patch. Related work addressed this task for range-sensors (LiDAR, Radar), but for vision, only as a secondary regression step after an initial cross-view image retrieval step. Since the local satellite patch could also be retrieved through any rough localization prior (e.g. from GPS/GNSS, temporal filtering), we drop the image retrieval objective and focus on the metric localization only. We devise a novel network architecture with denser satellite descriptors, similarity matching at the bottleneck (rather than at the output as in image retrieval), and a dense spatial distribution as output to capture multi-modal localization ambiguities. We compare against a state-of-the-art regression baseline that uses global image descriptors. Quantitative and qualitative experimental results on the recently proposed VIGOR and the Oxford RobotCar datasets validate our design. The produced probabilities are correlated with localization accuracy, and can even be used to roughly estimate the ground camera’s heading when its orientation is unknown. Overall, our method reduces the median metric localization error by 51%, 37%, and 28% compared to the state-of-the-art when generalizing respectively in the same area, across areas, and across time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (Brazil)

eBook: USD 89.00; Price excludes VAT (Brazil)

Softcover Book: USD 119.99; Price excludes VAT (Brazil)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Geographically Local Representation Learning with a Spatial Prior for Visual Localization

Metric localization for lunar rovers via cross-view image matching

Article Open access 11 April 2024

Anchored to features: an image-feature-aware planner for stable visual localization

Article 29 April 2024

Notes

1.
Models and code, plus extended data are available at
https://github.com/tudelft-iv/CrossViewMetricLocalization.

References

Agarwal, P., Burgard, W., Spinello, L.: Metric localization using google street view. In: IEEE/RSJ IROS, pp. 3111–3118 (2015)
Google Scholar
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of IEEE/CVF CVPR, pp. 5297–5307 (2016)
Google Scholar
Barsan, I.A., Wang, S., Pokrovsky, A., Urtasun, R.: Learning to localize using a lidar intensity map. In: CoRL (10 2018)
Google Scholar
Ben-Moshe, B., Elkin, E., et al.: Improving accuracy of gnss devices in urban canyons. In: CCCG, pp. 511–515 (2011)
Google Scholar
Chen, D.M., et al.: City-scale landmark identification on mobile devices. In: Proceedings of IEEE/CVF CVPR, pp. 737–744 (2011)
Google Scholar
Clement, L., Gridseth, M., Tomasi, J., Kelly, J.: Learning matchable image transformations for long-term metric visual localization. IEEE Robot. Autom. Lett. 5(2), 1492–1499 (2020). https://doi.org/10.1109/LRA.2020.2967659
Article Google Scholar
Deng, J., Dong, W., Socher, R., et al.: Imagenet: A large-scale hierarchical image database. In: Proceedings of IEEE/CVF CVPR, pp. 248–255 (2009)
Google Scholar
Hu, S., Feng, M., Nguyen, R.M., Hee Lee, G.: CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In: Proceedings of IEEE/CVF CVPR, pp. 7258–7267 (2018)
Google Scholar
Hu, S., Lee, G.H.: Image-based geo-localization using satellite imagery. IJCV, pp. 1–15 (2019)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of IEEE/CVF CVPR, pp. 1125–1134 (2017)
Google Scholar
Khosla, P., et al.: Supervised contrastive learning. ar**v preprint ar**v:2004.11362 (2020)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. ICLR (2014)
Google Scholar
Lategahn, H., Stiller, C.: Vision-only localization. IEEE Trans. Intell. Transport. Syst. 15(3), 1246–1257 (2014). https://doi.org/10.1109/TITS.2014.2298492
Article Google Scholar
Lin, T.Y., Belongie, S., Hays, J.: Cross-view image geolocalization. In: Proceedings of IEEE/CVF CVPR, pp. 891–898 (2013)
Google Scholar
Lin, T.Y., Cui, Y., Belongie, S., Hays, J.: Learning deep representations for ground-to-aerial geolocalization. In: Proceedings of IEEE/CVF CVPR, pp. 5007–5015 (2015)
Google Scholar
Liu, L., Li, H.: Lending orientation to neural networks for cross-view geo-localization. In: Proceedings of IEEE/CVF CVPR, pp. 5624–5633 (2019)
Google Scholar
Lowry, S., et al.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)
Article Google Scholar
Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: The oxford robotcar dataset. IJRR 36(1), 3–15 (2017)
Google Scholar
Maddern, W., Pascoe, G., et al.: Real-time kinematic ground truth for the oxford robotcar dataset. ar**v preprint: 2002.10152 (2020)
Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. ar**v preprint ar**v:1807.03748 (2018)
Regmi, K., Shah, M.: Bridging the domain gap for ground-to-aerial image matching. In: Proc. of IEEE/CVF ICCV, pp. 470–479 (2019)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of IEEE/CVF CVPR, pp. 815–823 (2015)
Google Scholar
Shi, Y., Li, H.: Beyond cross-view image retrieval: Highly accurate vehicle localization using satellite image. In: Proceedings of the IEEE/CVF CVPR (2022)
Google Scholar
Shi, Y., Liu, L., Yu, X., Li, H.: Spatial-aware feature aggregation for image based cross-view geo-localization. In: NeurIPS, pp. 10090–10100 (2019)
Google Scholar
Shi, Y., Yu, X., Campbell, D., Li, H.: Where am i looking at? joint location and orientation estimation by cross-view matching. In: Proceedings of IEEE/CVF CVPR, pp. 4064–4072 (2020)
Google Scholar
Shi, Y., Yu, X., Liu, L., et al.: Optimal feature transport for cross-view image geo-localization. In: Proceedings of AAAI, pp. 11990–11997 (2020)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Tang, T.Y., De Martini, D., Newman, P.: Get to the point: Learning lidar place recognition and metric localisation using overhead imagery. Robotics: Science and Systems (2021)
Google Scholar
Tang, T.Y., De Martini, D., Wu, S., Newman, P.: Self-supervised learning for using overhead imagery as maps in outdoor range sensor localization. IJRR 40(12–14), 1488–1509 (2021)
Google Scholar
Tang, T.Y., De Martini, D., Barnes, D., Newman, P.: Rsl-net: Localising in satellite images from a radar on the ground. IEEE Robot. Autom. Lett. 5(2), 1087–1094 (2020)
Article Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. MIT press (2005)
Google Scholar
Tian, Y., Chen, C., Shah, M.: Cross-view image matching for geo-localization in urban environments. In: Proceeidngs of IEEE/CVF CVPR, pp. 3608–3616 (2017)
Google Scholar
Toker, A., Zhou, Q., Maximov, M., Leal-Taixe, L.: Coming down to earth: Satellite-to-street view synthesis for geo-localization. In: Proc. of IEEE/CVF CVPR. pp. 6488–6497 (June 2021)
Google Scholar
Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: Proceedings of IEEE/CVF CVPR, pp. 1808–1817 (2015)
Google Scholar
Torii, A., Sivic, J., Okutomi, M., Pajdla, T.: Visual place recognition with repetitive structures. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2346–2359 (2015). https://doi.org/10.1109/TPAMI.2015.2409868
Article Google Scholar
Vo, Nam N.., Hays, James: Localizing and orienting street views using overhead imagery. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9905, pp. 494–509. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_30
Chapter Google Scholar
Wei, X., Bârsan, I.A., Wang, S., Martinez, J., Urtasun, R.: Learning to localize through compressed binary maps. In: Proceedings of IEEE/CVF CVPR, pp. 10316–10324 (2019)
Google Scholar
Won, D., et al.: Performance improvement of inertial navigation system by using magnetometer with vehicle dynamic constraints. J. Sensors, 1–11 (2015)
Google Scholar
Workman, S., Souvenir, R., Jacobs, N.: Wide-area image geolocalization with aerial reference imagery. In: Proceedings of IEEE/CVF ICCV, pp. 3961–3969 (2015)
Google Scholar
**a, Z., Booij, O., Manfredi, M., Kooij, J.F.P.: Cross-view matching for vehicle localization by learning geographically local representations. IEEE Robot. Autom. Lett. 6(3), 5921–5928 (2021). https://doi.org/10.1109/LRA.2021.3088076
Article Google Scholar
**a, Z., Booij, O., Manfredi, M., Kooij, J.F.P.: Geographically local representation learning with a spatial prior for visual localization. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 557–573. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_38
Chapter Google Scholar
Yang, H., Lu, X., Zhu, Y.: Cross-view geo-localization with layer-to-layer transformer. In: NeurIPS. pp. 29009–29020 (2021)
Google Scholar
Yin, H., Chen, R., Wang, Y., **ong, R.: Rall: end-to-end radar localization on lidar map using differentiable measurement model. IEEE Transactions on Intelligent Transportation Systems (2021)
Google Scholar
Zhai, M., Bessinger, Z., Workman, S., Jacobs, N.: Predicting ground-level scene layout from aerial imagery. In: Proceedings of IEEE/CVF CVPR, pp. 867–875 (2017)
Google Scholar
Zhu, S., Shah, M., Chen, C.: Transgeo: Transformer is all you need for cross-view image geo-localization. In: Proceedings of the IEEE/CVF CVPR, pp. 1162–1171 (2022)
Google Scholar
Zhu, S., Yang, T., Chen, C.: Revisiting street-to-aerial view image geo-localization and orientation estimation. In: Proceedings of IEEE/CVF WACV, pp. 756–765 (2021)
Google Scholar
Zhu, S., Yang, T., Chen, C.: Vigor: Cross-view image geo-localization beyond one-to-one retrieval. In: Proceedings of IEEE/CVF CVPR, pp. 3640–3649 (2021)
Google Scholar

Download references

Acknowledgements

This work is part of the research programme Efficient Deep Learning (EDL) with project number P16-25, which is (partly) financed by the Dutch Research Council (NWO).

Author information

Authors and Affiliations

Intelligent Vehicles Group, Technical University Delft, Delft, The Netherlands
Zimin **a & Julian F. P. Kooij
TomTom, Amsterdam, The Netherlands
Olaf Booij & Marco Manfredi

Authors

Zimin **a
View author publications
You can also search for this author in PubMed Google Scholar
Olaf Booij
View author publications
You can also search for this author in PubMed Google Scholar
Marco Manfredi
View author publications
You can also search for this author in PubMed Google Scholar
Julian F. P. Kooij
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zimin **a .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10263 KB)

Supplementary material 2 (mp4 9523 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

**a, Z., Booij, O., Manfredi, M., Kooij, J.F.P. (2022). Visual Cross-View Metric Localization with Dense Uncertainty Estimates. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-19842-7_6
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19841-0
Online ISBN: 978-3-031-19842-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Visual Cross-View Metric Localization with Dense Uncertainty Estimates

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Geographically Local Representation Learning with a Spatial Prior for Visual Localization

Metric localization for lunar rovers via cross-view image matching

Anchored to features: an image-feature-aware planner for stable visual localization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 10263 KB)

Supplementary material 2 (mp4 9523 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Visual Cross-View Metric Localization with Dense Uncertainty Estimates

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Geographically Local Representation Learning with a Spatial Prior for Visual Localization

Metric localization for lunar rovers via cross-view image matching

Anchored to features: an image-feature-aware planner for stable visual localization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 10263 KB)

Supplementary material 2 (mp4 9523 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation