AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline

He, Anfeng; Wang, Guangting; Luo, Chong; Tian, **nmei; Zeng, Wenjun

doi:10.1007/978-3-030-68238-5_42

Anfeng He¹⁰,
Guangting Wang¹⁰,
Chong Luo¹¹,
**nmei Tian¹⁰ &
…
Wenjun Zeng¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12539))

Included in the following conference series:

European Conference on Computer Vision

2048 Accesses

Abstract

Siamese network based trackers have become a mainstream in visual object tracking. Recently, several high-performance multi-stage trackers have been proposed and some of them adopt SiamRPN for the first-stage region proposal. We argue that an anchor-based region proposal network is not necessary for the tracking task, as a tracker has a strong prior about the location and size of the target. In this paper, we propose a two-stage visual tracker which uses SiamFC for region proposal. SiamFC defines a bounding box by its center, which is a typical anchor-free (AF) network, so we dub our tracker AF2S. As the model size of SiamFC is only about 1/10 that of SiamRPN, AF2S results in a significantly lighter model than its SiamRPN-based counterparts. In the design of AF2S, we first build a strong AlexNet-based SiamFC baseline which improves the AUC on OTB-100 from 0.582 to 0.665. Further, we propose a position-sensitive convolutional layer which can be stacked after SiamFC backbone to increase the robustness of proposals without losing localization precision. Finally, a relation network is used for box refinement. Experimental results show that AF2S achieves the best performance on OTB-100 and VOT-18 among the state-of-the-art trackers which use AlexNet as backbone. On LaSOT-test, AF2S achieves an AUC of 0.480, which is among the first-tier performance even when trackers with more powerful backbone and much larger model size are considered.

A. He and G. Wang—This work is carried out while Anfeng He and Guangting Wang are interns in MSRA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Siamese visual tracking based on criss-cross attention and improved head network

Article 09 May 2023

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Article 04 July 2022

Siamese anchor-free object tracking with multiscale spatial attentions

Article Open access 25 November 2021

References

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Chen, B., Wang, D., Li, P., Wang, S., Lu, H.: Real-time’actor-critic’tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 318–334 (2018)
Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR, pp. 4660–4669 (2019)
Google Scholar
Fan, H., Lin, L., Yang et al., F.: LaSOT: a high-quality benchmark for large-scale single object tracking. ar**v preprint ar**v:1809.07845 (2018)
Fan, H., Ling, H.: Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: ICCV, October 2017
Google Scholar
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)
Google Scholar
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: ICCV, pp. 4649–4659 (2019)
Google Scholar
He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: CVPR, June 2018
Google Scholar
Jiang, B., Luo, R., Mao, J., **ao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–799 (2018)
Google Scholar
Kristan, M., Leonardis, A., Matas, J., Felsberg, M.: The visual object tracking vot2016 challenge results. In: ECCV Workshop (2016)
Google Scholar
Kristan, M., Leonardis, A., Matas, J., et al.: The sixth visual object tracking vot2018 challenge results. In: ECCV Workshop (2018)
Google Scholar
Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016). https://doi.org/10.1109/TPAMI.2016.2516982
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., **ng, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)
Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR, June 2018
Google Scholar
Li, X., Ma, C., Wu, B., He, Z., Yang, M.H.: Target-aware deep tracking. In: CVPR, pp. 1369–1378 (2019)
Google Scholar
Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR, pp. 2359–2367 (2017)
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Real, E., Shlens, J., Mazzocchi, S.: YouTube-BoundingBoxes: a large high-precision human-annotated data set for object detection in video. In: CVPR, pp. 7464–7473 (2017)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Wang, G., Luo, C., **ong, Z., Zeng, W.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: CVPR, pp. 3643–3652 (2019)
Google Scholar
Wang, Q., Teng, Z., **ng, J., Gao, J.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: CVPR, June 2018
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: CVPR, pp. 4591–4600 (2019)
Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W.: Distractor-aware siamese networks for visual object tracking. In: ECCV, pp. 101–117 (2018)
Google Scholar

Download references

Acknowledgement

This work was supported by National Key Research and Development Program of China under Grant 2017YFB1002203 and the National Natural Science Foundation of China under Grant 61872329.

Author information

Authors and Affiliations

CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, University of Science and Technology of China, Hefei, Anhui, China
Anfeng He, Guangting Wang & **nmei Tian
Microsoft Research Asia, Bei**g, China
Chong Luo & Wenjun Zeng

Authors

Anfeng He
View author publications
You can also search for this author in PubMed Google Scholar
Guangting Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chong Luo
View author publications
You can also search for this author in PubMed Google Scholar
**nmei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Wenjun Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anfeng He .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, A., Wang, G., Luo, C., Tian, X., Zeng, W. (2020). AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-68238-5_42
Published: 31 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68237-8
Online ISBN: 978-3-030-68238-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Siamese visual tracking based on criss-cross attention and improved head network

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Siamese anchor-free object tracking with multiscale spatial attentions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Siamese visual tracking based on criss-cross attention and improved head network

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Siamese anchor-free object tracking with multiscale spatial attentions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation