AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12539))

Included in the following conference series:

  • 2048 Accesses

Abstract

Siamese network based trackers have become a mainstream in visual object tracking. Recently, several high-performance multi-stage trackers have been proposed and some of them adopt SiamRPN for the first-stage region proposal. We argue that an anchor-based region proposal network is not necessary for the tracking task, as a tracker has a strong prior about the location and size of the target. In this paper, we propose a two-stage visual tracker which uses SiamFC for region proposal. SiamFC defines a bounding box by its center, which is a typical anchor-free (AF) network, so we dub our tracker AF2S. As the model size of SiamFC is only about 1/10 that of SiamRPN, AF2S results in a significantly lighter model than its SiamRPN-based counterparts. In the design of AF2S, we first build a strong AlexNet-based SiamFC baseline which improves the AUC on OTB-100 from 0.582 to 0.665. Further, we propose a position-sensitive convolutional layer which can be stacked after SiamFC backbone to increase the robustness of proposals without losing localization precision. Finally, a relation network is used for box refinement. Experimental results show that AF2S achieves the best performance on OTB-100 and VOT-18 among the state-of-the-art trackers which use AlexNet as backbone. On LaSOT-test, AF2S achieves an AUC of 0.480, which is among the first-tier performance even when trackers with more powerful backbone and much larger model size are considered.

A. He and G. Wang—This work is carried out while Anfeng He and Guangting Wang are interns in MSRA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  2. Chen, B., Wang, D., Li, P., Wang, S., Lu, H.: Real-time’actor-critic’tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 318–334 (2018)

    Google Scholar 

  3. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)

    Google Scholar 

  4. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR, pp. 4660–4669 (2019)

    Google Scholar 

  5. Fan, H., Lin, L., Yang et al., F.: LaSOT: a high-quality benchmark for large-scale single object tracking. ar**v preprint ar**v:1809.07845 (2018)

  6. Fan, H., Ling, H.: Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: ICCV, October 2017

    Google Scholar 

  7. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)

    Google Scholar 

  8. Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: ICCV, pp. 4649–4659 (2019)

    Google Scholar 

  9. He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: CVPR, June 2018

    Google Scholar 

  10. Jiang, B., Luo, R., Mao, J., **ao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–799 (2018)

    Google Scholar 

  11. Kristan, M., Leonardis, A., Matas, J., Felsberg, M.: The visual object tracking vot2016 challenge results. In: ECCV Workshop (2016)

    Google Scholar 

  12. Kristan, M., Leonardis, A., Matas, J., et al.: The sixth visual object tracking vot2018 challenge results. In: ECCV Workshop (2018)

    Google Scholar 

  13. Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016). https://doi.org/10.1109/TPAMI.2016.2516982

    Article  Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  15. Li, B., Wu, W., Wang, Q., Zhang, F., **ng, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)

    Google Scholar 

  16. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR, June 2018

    Google Scholar 

  17. Li, X., Ma, C., Wu, B., He, Z., Yang, M.H.: Target-aware deep tracking. In: CVPR, pp. 1369–1378 (2019)

    Google Scholar 

  18. Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR, pp. 2359–2367 (2017)

    Google Scholar 

  19. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  20. Real, E., Shlens, J., Mazzocchi, S.: YouTube-BoundingBoxes: a large high-precision human-annotated data set for object detection in video. In: CVPR, pp. 7464–7473 (2017)

    Google Scholar 

  21. Russakovsky, O., Deng, J., Su, H., Krause, J.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  22. Wang, G., Luo, C., **ong, Z., Zeng, W.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: CVPR, pp. 3643–3652 (2019)

    Google Scholar 

  23. Wang, Q., Teng, Z., **ng, J., Gao, J.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: CVPR, June 2018

    Google Scholar 

  24. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  25. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: CVPR, pp. 4591–4600 (2019)

    Google Scholar 

  26. Zhu, Z., Wang, Q., Li, B., Wu, W.: Distractor-aware siamese networks for visual object tracking. In: ECCV, pp. 101–117 (2018)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Key Research and Development Program of China under Grant 2017YFB1002203 and the National Natural Science Foundation of China under Grant 61872329.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anfeng He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, A., Wang, G., Luo, C., Tian, X., Zeng, W. (2020). AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68238-5_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68237-8

  • Online ISBN: 978-3-030-68238-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation