Log in

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Siamese network has been proven to achieve excellent results for visual object tracking where the SiamFC(Fully-Convolutional)is among the most well-known seminar work. Recently, with the successful application of the Region Proposal Network (RPN) in object detection, siamese networks combined with RPN have achieved good performance in visual tracking tasks. However, RPN requires the selection of the number, aspect ratio and size of the anchor boxes and these anchor-related parameters more often than not, need manual intervention and tuning. In this work, we first add a channel-aware module in the siamese network to obtain the more discriminative features. Thereafter, we propose an anchor-free strategy to replace the RPN module. The proposed framework consists of two networks, namely, the Siamese network and the Centerness Prediction network (CPN). We call the proposed method SiamCPN. In the Siamese network, Resnet50 is used as the backbone. SiamCPN is simple and relatively efficient due to the fact that it avoids the need for complicated hyper-parameters of the anchor boxes. Extensive experimental results on four visual tracking benchmark datasets, OTB100, VOT2016, UAV123 and LaSOT show that the proposed framework has achieved highly competitive and better performance compared with the state-of-the-art trackers. SiamCPN can run at 60 frames per second (FPS) on an AMD processor with 2 RTX3090.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Yang C, Duraiswami R, Davis L (2005) Efficient mean-shift tracking via a new similarity measure. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp 176–183 IEEE

  2. Zhang S, Yao H, Sun X, Lu X (2013) Sparse coding based visual tracking: Review and experimental comparison. Pattern Recognition 46(7):1772–1788

    Article  Google Scholar 

  3. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Transactions on Image Processing 23(5):2019–2032

    Article  MathSciNet  MATH  Google Scholar 

  4. Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE transactions on pattern analysis and machine intelligence 37(3):583–596

    Article  Google Scholar 

  5. Danelljan M, Hager G, Khan FS, Felsberg M (2016) Discriminative scale space tracking. IEEE transactions on pattern analysis and machine intelligence 39(8):1561–1575

    Article  Google Scholar 

  6. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4310–4318

  7. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 1401–1409

  8. Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision, Springer, pp 472–488

  9. Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6638–6646

  10. Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 483–498

  11. Zhang J, Yang J, Yu J, Fan J (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. International Journal of Intelligent Systems 37(5):3117–3141

    Article  Google Scholar 

  12. Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE transactions on pattern analysis and machine intelligence

  13. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, Springer, pp 850–865

  14. Wang Q, Teng Z, **ng J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4854–4863

  15. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8971–8980

  16. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 101–117

  17. Li B, Wu W, Wang Q, Zhang F, **ng J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291

  18. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6668–6677

  19. Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence 34:12549–12556

    Article  Google Scholar 

  20. Yang K, He Z, Zhou Z, Fan N (2020) Siamatt: Siamese attention network for visual tracking. Knowledge-based systems 203:106079

    Article  Google Scholar 

  21. Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1763–1771

  22. Wu Y, Lim J, Yang M-H (2013) Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2411–2418

  23. Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Hager G, Lukezic A, Eldesokey A, et al (2016) The Visual Object Tracking VOT2016 challenge results. Springer http://www.springer.com/gp/book/9783319488806

  24. Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European Conference on Computer Vision, Springer, pp 445–461

  25. Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5374–5383

  26. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28

  27. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25

  28. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  29. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861

  30. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  31. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 734–750

  32. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636

  33. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp 315–323. JMLR Workshop and Conference Proceedings

  34. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. ar**v preprint ar**v:1511.07122

  35. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence 34:12993–13000

    Article  Google Scholar 

  36. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision, Springer, pp 740–755

  37. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  38. Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5296–5305

  39. Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(5):1562–1577

    Article  Google Scholar 

  40. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp 3464–3468 IEEE

  41. Ning G, Zhang Z, Huang C, Ren X, Wang H, Cai C, He Z (2017) Spatially supervised recurrent convolutional neural networks for visual object tracking. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp 1–4. IEEE

Download references

Funding

This paper is funded by National Key R &D Program of China (No.2021YFF0603904).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengtao Cai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Cai, C. & Yeo, C.K. Siamese Centerness Prediction Network for Real-Time Visual Object Tracking. Neural Process Lett 55, 1029–1044 (2023). https://doi.org/10.1007/s11063-022-10924-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10924-4

Keywords

Navigation