Log in

Effective long-term tracking with contrast optimizer

  • ORIGINAL PAPER
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The main challenge of long-term tracking includes data uncertainty in long-term observations. Previous methods tackle the long-term tracking task by online update-based trackers. However, sophisticated online update strategies of these trackers are usually with a considerable computational burden. In this work, a contrastive learning-based online optimizer-assisted long-term tracking framework (named LTCO) is proposed to guide the online tracker to make more accurate update decisions while reducing the impact of online updates on tracking speed. Specifically, the optimizer first perceives the similarity between distractors and positive samples through metric learning. Next, the contrastive learning between target anchors and hard negative samples forces the optimizer to notice the difference between targets and distractors. Finally, the optimizer will learn a binary output to assist the tracker updating. The proposed optimizer can be easily integrated into other online trackers with little impact on their running speed. Extensive experimental results show that the method achieves state-of-the-art performance on the VOT2018LT, VOT2019LT, OxUvA, and LaSOT benchmarks while running at real-time speed on GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer (2016)

  2. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6182–6191. IEEE (2019)

  3. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 33, 9912–9924 (2020)

    Google Scholar 

  4. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

  5. Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8126–8135. IEEE (2021)

  6. Choi, J., Kwon, J., Lee, K.M.: Deep meta learning for real-time target-aware visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 911–920. IEEE (2019)

  7. Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4670–4679. IEEE (2019)

  8. Dai, K., Zhang, Y., Wang, D., Li, J., Lu, H., Yang, X.: High-performance long-term tracking with meta-updater. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6298–6307. IEEE (2020)

  9. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660–4669. IEEE (2019)

  10. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646. IEEE (2017)

  11. Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision, pp. 472–488. Springer (2016)

  12. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

  13. Dunnhofer, M., Machine, C.M.: CoCoLoT: combining complementary trackers in long-term visual tracking. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 5132–5139. IEEE (2022)

  14. Dunnhofer, M., Simonato, K., Micheloni, C.: Combining complementary trackers for enhanced long-term visual object tracking. Image Vis. Comput. 122, 104448 (2022)

    Article  Google Scholar 

  15. Fan, H., Bai, H.X., Lin, L.T., Yang, F., Chu, P., Deng, G., Yu, S.J., Harshit, Huang, M.Z., Liu, J.H., Xu, Y., Liao, C.Y., Yuan, L., Ling, H.B.: LaSOT: a high-quality large-scale single object tracking benchmark. Int. J. Comput. Vis. 129, 439–461 (2021)

    Article  Google Scholar 

  16. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961. IEEE (2019)

  17. He, K., Fan, H., Wu, Y., **e, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738. IEEE (2020)

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)

  19. Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 749–758. IEEE (2015)

  20. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intel. 43, 1562–1577 (2019)

    Article  Google Scholar 

  21. Huang, L., Zhao, X., Huang, K.: Globaltrack: a simple and strong baseline for long-term tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11037–11044. AAAI (2020)

  22. Ioffe, S. Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

  23. Jung, I., Son, J., Baek, M., Han, B.: Real-time mdnet. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 83–98. Springer (2018)

  24. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1409–1422 (2011)

    Article  Google Scholar 

  25. Karthik, S., Moudgil, A., Gandhi, V.: Exploring 3 R's of long-term tracking: redetection, recovery and reliability. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1011–1020. IEEE (2020)

  26. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., ˇCehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A.: The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 3–53. Springer (2018)

  27. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Danelljan, M., Zajc, L.Č., Lukežič, A., Drbohlav, O.: The eighth visual object tracking VOT2020 challenge results. In: European Conference on Computer Vision, pp. 547–601. Springer (2020)

  28. Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Chang, H.J., Danelljan, M., Cehovin, L., Lukežič, A.: The ninth visual object tracking vot2021 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2711–2738. IEEE (2021)

  29. Li, B., Wu, W., Wang, Q., Zhang, F., **ng, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291. IEEE (2019)

  30. Li, B., **e, W., Zeng, W., Liu, W.: Learning to update for object tracking with recurrent meta-learner. IEEE Trans. Image Process. 28, 3624–3635 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  31. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980. IEEE (2018)

  32. Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6162–6171. IEEE (2019)

  33. Li, S., Zhao, S., Cheng, B., Chen, J.: Noise-aware framework for robust visual tracking. IEEE Trans. Cybern. 52, 1179–1192 (2020)

    Article  Google Scholar 

  34. Li, S., Zhao, S., Cheng, B., Chen, J.: Dynamic particle filter framework for robust object tracking. IEEE Trans. Circuits Syst. Video Technol. 32, 3735–3748 (2021)

    Article  Google Scholar 

  35. Li, S., Zhao, S., Cheng, B., Chen, J.: Part-aware framework for robust object tracking. IEEE Trans. Image Process. 32, 750–763 (2023)

    Article  Google Scholar 

  36. Li, S., Zhao, S., Cheng, B., Zhao, E., Chen, J.: Robust visual tracking via hierarchical particle filter and ensemble deep features. IEEE Trans. Circuits Syst. Video Technol. 30, 179–191 (2018)

    Article  Google Scholar 

  37. Lukezic, A., Zajc, L.C., Vojir, T., Matas, J., Kristan, M.: Performance evaluation methodology for long-term single-object tracking. IEEE Trans. Cybern. 51, 6305–6318 (2021)

    Article  Google Scholar 

  38. Lukežič, A., Zajc, L.Č., Vojíř, T., Matas, J., Kristan, M.: Now you see me: evaluating performance in long-term visual tracking. ar**v (2018), ar**v:1804.07056

  39. Ma, C., Yang, X., Zhang, C., Yang, M.-H.: Long-term correlation tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388–5396. IEEE (2015)

  40. Marvasti-Zadeh, S.M., Cheng, L., Ghanei-Yakhdan, H., Kasaei, S.: Deep learning for visual tracking: a comprehensive survey. IEEE Trans. Intell. Transp. Syst. 23, 3943–3968 (2021)

    Article  Google Scholar 

  41. Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D.P., Yu, F., Van Gool, L.: Transforming model prediction for tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8731–8740. IEEE (2022)

  42. Misra, I., Maaten, L.v.d.: Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717. IEEE (2020)

  43. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302. IEEE (2016)

  44. Park, E., Berg, A.C.: Meta-tracker: fast and robust online adaptation for visual object trackers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 569–585. Springer (2018)

  45. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the Conference and Workshop on Neural Information Processing Systems, pp. 91–99. MIT Press (2015)

  46. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  47. Shen, Q., Qiao, L., Guo, J., Li, P., Li, X., Li, B., Feng, W., Gan, W., Wu, W., Ouyang, W.: Unsupervised learning of accurate Siamese tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8101–8110. IEEE (2022)

  48. Sun, C., Wang, D., Lu, H., Yang, M.-H.: Correlation tracking via joint discrimination and reliability learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 489–497. IEEE (2018)

  49. Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429. IEEE (2016)

  50. Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W., Torr, P.H., Gavves, E.: Long-term tracking in the wild: a benchmark. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685. Springer (2018)

  51. Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam r-cnn: visual tracking by re-detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6578–6588 (2020)

  52. Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1571–1580. IEEE (2021)

  53. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015)

    Article  Google Scholar 

  54. Wu, Z., **ong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742. IEEE (2018)

  55. Xu, X., Zhao, J., Wu, J., Shen, F.: Switch and refine: a long-term tracking and segmentation framework. IEEE Trans. Circuits Syst. Video Technol. 33, 1291–1304 (2022)

    Article  Google Scholar 

  56. Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10448–10457. IEEE (2021)

  57. Yan, B., Zhao, H., Wang, D., Lu, H., Yang, X.: Skimming-Perusal'Tracking: a framework for real-time and robust long-term tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2385–2393. IEEE (2019)

  58. Yu, L., Zhang, H., Yu, J., Qiao, B.: Online-adaptive classification and regression network with sample-efficient meta learning for long-term tracking. Image Vis. Comput. 112, 104181 (2021)

    Article  Google Scholar 

  59. Zbontar, J., **g, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: International Conference on Machine Learning, pp. 12310–12320. PMLR (2021)

  60. Zhang, T., Liu, S., Xu, C., Liu, B., Yang, M.-H.: Correlation particle filter for visual tracking. IEEE Trans. Image Process. 27, 2676–2687 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  61. Zhang, T., Xu, C., Yang, M.-H.: Learning multi-task correlation particle filters for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41, 365–378 (2018)

    Article  Google Scholar 

  62. Zhang, Y.H., Wang, L.J., Wang, D., Qi, J.Q., Lu, H.C.A.: Learning regression and verification networks for robust long-term tracking. Int. J. Comput. Vis. 129, 2536–2547 (2021)

    Article  Google Scholar 

  63. Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 771–787. Springer (2020)

  64. Zhang, Z., Zhong, B., Zhang, S., Tang, Z., Liu, X., Zhang, Z.: Distractor-aware fast tracking via dynamic convolutions and mot philosophy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1024–1033. IEEE (2021)

  65. Zhao, H.J., Yan, B., Wang, D., Qian, X.S., Yang, X.Y., Lu, H.C.: Effective local and global search for fast long-term tracking. IEEE Trans. Pattern Anal. Mach. Intell. 45, 460–474 (2022)

    Article  Google Scholar 

  66. Zhou, Z., Chen, J., Pei, W., Mao, K., Wang, H., He, Z.: Global tracking via ensemble of local trackers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8761–8770. IEEE (2022)

  67. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117. Springer (2018)

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (Grant No. 31171775), the Key Science and Research Program of the Education Department of Henan Province (Grant No. 17A510008), and the Innovative Funds Plan of Henan University of Technology Plan (Grant No. 2020ZKCJ02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yitao Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Y., Liang, Y. Effective long-term tracking with contrast optimizer. Machine Vision and Applications 34, 70 (2023). https://doi.org/10.1007/s00138-023-01422-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01422-1

Keywords

Navigation