Abstract
The main challenge of long-term tracking includes data uncertainty in long-term observations. Previous methods tackle the long-term tracking task by online update-based trackers. However, sophisticated online update strategies of these trackers are usually with a considerable computational burden. In this work, a contrastive learning-based online optimizer-assisted long-term tracking framework (named LTCO) is proposed to guide the online tracker to make more accurate update decisions while reducing the impact of online updates on tracking speed. Specifically, the optimizer first perceives the similarity between distractors and positive samples through metric learning. Next, the contrastive learning between target anchors and hard negative samples forces the optimizer to notice the difference between targets and distractors. Finally, the optimizer will learn a binary output to assist the tracker updating. The proposed optimizer can be easily integrated into other online trackers with little impact on their running speed. Extensive experimental results show that the method achieves state-of-the-art performance on the VOT2018LT, VOT2019LT, OxUvA, and LaSOT benchmarks while running at real-time speed on GPU.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-023-01422-1/MediaObjects/138_2023_1422_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-023-01422-1/MediaObjects/138_2023_1422_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-023-01422-1/MediaObjects/138_2023_1422_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-023-01422-1/MediaObjects/138_2023_1422_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-023-01422-1/MediaObjects/138_2023_1422_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-023-01422-1/MediaObjects/138_2023_1422_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-023-01422-1/MediaObjects/138_2023_1422_Fig7_HTML.png)
Similar content being viewed by others
References
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer (2016)
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6182–6191. IEEE (2019)
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 33, 9912–9924 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8126–8135. IEEE (2021)
Choi, J., Kwon, J., Lee, K.M.: Deep meta learning for real-time target-aware visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 911–920. IEEE (2019)
Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4670–4679. IEEE (2019)
Dai, K., Zhang, Y., Wang, D., Li, J., Lu, H., Yang, X.: High-performance long-term tracking with meta-updater. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6298–6307. IEEE (2020)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660–4669. IEEE (2019)
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646. IEEE (2017)
Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision, pp. 472–488. Springer (2016)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Dunnhofer, M., Machine, C.M.: CoCoLoT: combining complementary trackers in long-term visual tracking. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 5132–5139. IEEE (2022)
Dunnhofer, M., Simonato, K., Micheloni, C.: Combining complementary trackers for enhanced long-term visual object tracking. Image Vis. Comput. 122, 104448 (2022)
Fan, H., Bai, H.X., Lin, L.T., Yang, F., Chu, P., Deng, G., Yu, S.J., Harshit, Huang, M.Z., Liu, J.H., Xu, Y., Liao, C.Y., Yuan, L., Ling, H.B.: LaSOT: a high-quality large-scale single object tracking benchmark. Int. J. Comput. Vis. 129, 439–461 (2021)
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961. IEEE (2019)
He, K., Fan, H., Wu, Y., **e, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738. IEEE (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 749–758. IEEE (2015)
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intel. 43, 1562–1577 (2019)
Huang, L., Zhao, X., Huang, K.: Globaltrack: a simple and strong baseline for long-term tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11037–11044. AAAI (2020)
Ioffe, S. Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Jung, I., Son, J., Baek, M., Han, B.: Real-time mdnet. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 83–98. Springer (2018)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1409–1422 (2011)
Karthik, S., Moudgil, A., Gandhi, V.: Exploring 3 R's of long-term tracking: redetection, recovery and reliability. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1011–1020. IEEE (2020)
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., ˇCehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A.: The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 3–53. Springer (2018)
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Danelljan, M., Zajc, L.Č., Lukežič, A., Drbohlav, O.: The eighth visual object tracking VOT2020 challenge results. In: European Conference on Computer Vision, pp. 547–601. Springer (2020)
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Chang, H.J., Danelljan, M., Cehovin, L., Lukežič, A.: The ninth visual object tracking vot2021 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2711–2738. IEEE (2021)
Li, B., Wu, W., Wang, Q., Zhang, F., **ng, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291. IEEE (2019)
Li, B., **e, W., Zeng, W., Liu, W.: Learning to update for object tracking with recurrent meta-learner. IEEE Trans. Image Process. 28, 3624–3635 (2019)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980. IEEE (2018)
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6162–6171. IEEE (2019)
Li, S., Zhao, S., Cheng, B., Chen, J.: Noise-aware framework for robust visual tracking. IEEE Trans. Cybern. 52, 1179–1192 (2020)
Li, S., Zhao, S., Cheng, B., Chen, J.: Dynamic particle filter framework for robust object tracking. IEEE Trans. Circuits Syst. Video Technol. 32, 3735–3748 (2021)
Li, S., Zhao, S., Cheng, B., Chen, J.: Part-aware framework for robust object tracking. IEEE Trans. Image Process. 32, 750–763 (2023)
Li, S., Zhao, S., Cheng, B., Zhao, E., Chen, J.: Robust visual tracking via hierarchical particle filter and ensemble deep features. IEEE Trans. Circuits Syst. Video Technol. 30, 179–191 (2018)
Lukezic, A., Zajc, L.C., Vojir, T., Matas, J., Kristan, M.: Performance evaluation methodology for long-term single-object tracking. IEEE Trans. Cybern. 51, 6305–6318 (2021)
Lukežič, A., Zajc, L.Č., Vojíř, T., Matas, J., Kristan, M.: Now you see me: evaluating performance in long-term visual tracking. ar**v (2018), ar**v:1804.07056
Ma, C., Yang, X., Zhang, C., Yang, M.-H.: Long-term correlation tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388–5396. IEEE (2015)
Marvasti-Zadeh, S.M., Cheng, L., Ghanei-Yakhdan, H., Kasaei, S.: Deep learning for visual tracking: a comprehensive survey. IEEE Trans. Intell. Transp. Syst. 23, 3943–3968 (2021)
Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D.P., Yu, F., Van Gool, L.: Transforming model prediction for tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8731–8740. IEEE (2022)
Misra, I., Maaten, L.v.d.: Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717. IEEE (2020)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302. IEEE (2016)
Park, E., Berg, A.C.: Meta-tracker: fast and robust online adaptation for visual object trackers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 569–585. Springer (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the Conference and Workshop on Neural Information Processing Systems, pp. 91–99. MIT Press (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Shen, Q., Qiao, L., Guo, J., Li, P., Li, X., Li, B., Feng, W., Gan, W., Wu, W., Ouyang, W.: Unsupervised learning of accurate Siamese tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8101–8110. IEEE (2022)
Sun, C., Wang, D., Lu, H., Yang, M.-H.: Correlation tracking via joint discrimination and reliability learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 489–497. IEEE (2018)
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429. IEEE (2016)
Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W., Torr, P.H., Gavves, E.: Long-term tracking in the wild: a benchmark. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685. Springer (2018)
Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam r-cnn: visual tracking by re-detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6578–6588 (2020)
Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1571–1580. IEEE (2021)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015)
Wu, Z., **ong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742. IEEE (2018)
Xu, X., Zhao, J., Wu, J., Shen, F.: Switch and refine: a long-term tracking and segmentation framework. IEEE Trans. Circuits Syst. Video Technol. 33, 1291–1304 (2022)
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10448–10457. IEEE (2021)
Yan, B., Zhao, H., Wang, D., Lu, H., Yang, X.: Skimming-Perusal'Tracking: a framework for real-time and robust long-term tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2385–2393. IEEE (2019)
Yu, L., Zhang, H., Yu, J., Qiao, B.: Online-adaptive classification and regression network with sample-efficient meta learning for long-term tracking. Image Vis. Comput. 112, 104181 (2021)
Zbontar, J., **g, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: International Conference on Machine Learning, pp. 12310–12320. PMLR (2021)
Zhang, T., Liu, S., Xu, C., Liu, B., Yang, M.-H.: Correlation particle filter for visual tracking. IEEE Trans. Image Process. 27, 2676–2687 (2017)
Zhang, T., Xu, C., Yang, M.-H.: Learning multi-task correlation particle filters for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41, 365–378 (2018)
Zhang, Y.H., Wang, L.J., Wang, D., Qi, J.Q., Lu, H.C.A.: Learning regression and verification networks for robust long-term tracking. Int. J. Comput. Vis. 129, 2536–2547 (2021)
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 771–787. Springer (2020)
Zhang, Z., Zhong, B., Zhang, S., Tang, Z., Liu, X., Zhang, Z.: Distractor-aware fast tracking via dynamic convolutions and mot philosophy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1024–1033. IEEE (2021)
Zhao, H.J., Yan, B., Wang, D., Qian, X.S., Yang, X.Y., Lu, H.C.: Effective local and global search for fast long-term tracking. IEEE Trans. Pattern Anal. Mach. Intell. 45, 460–474 (2022)
Zhou, Z., Chen, J., Pei, W., Mao, K., Wang, H., He, Z.: Global tracking via ensemble of local trackers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8761–8770. IEEE (2022)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117. Springer (2018)
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (Grant No. 31171775), the Key Science and Research Program of the Education Department of Henan Province (Grant No. 17A510008), and the Innovative Funds Plan of Henan University of Technology Plan (Grant No. 2020ZKCJ02).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, Y., Liang, Y. Effective long-term tracking with contrast optimizer. Machine Vision and Applications 34, 70 (2023). https://doi.org/10.1007/s00138-023-01422-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01422-1