Abstract
Siamese networks have drawn great attention in visual tracking in recently years because they have a good balance between accuracy and speed. However, in most Siamese trackers, their backbone networks used as feature extractor are relatively shallow and narrow like AlexNet, which does not take full advantage of deep neural networks. In this paper, we propose a lightweight Siamese network object tracking algorithm based on efficient attention mechanism to enhance tracking robustness and accuracy. Firstly, we modify MobileNetV2 and use it as our backbone network, it can reduces the parameters and calculation amount drastically and upgrades the speed of training and testing. Secondly, attention mechanism weighted the feature maps in channels and spatial use for distributing the contribution of the different response maps. Thirdly, different level features are fused for the purpose of obtaining more robust results. The experiments show that our tracker can improve both the accuracy and speed on three benchmarks, including OTB2015, VOT2018 and TrackingNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Sandler, M., Howard, A., Zhu, M., Zhmogino, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 4510–4520. IEEE (2018)
Bolme, D.S., Beverideg, J.R., Draper, B.A.: Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 2544–2550. IEEE (2010)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 702–715. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_50
Henriques, J.F., Caseiro, R., Martins, P.: High-speed tracking with kernelized correlation filters. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 583–596 (2015)
Danelljan, M., Bhat, G., Shahbaz, K.F.: ECO: Efficient convolution operators for tracking.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 6638–6646. IEEE (2017)
Danelljan, M., Hager, G., Shahbaz, K.F.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV), New York, pp. 58–66. IEEE (2015)
Ma, C., Huang, J., Yang, X.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), New York, pp. 3074–3082. IEEE (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Guo, Q., Feng, W., Zhou, C.: Learning dynamic siamese network for visual object tracking.In: IEEE International Conference on Computer Vision (ICCV), New York, pp. 1763–1771. IEEE (2017)
He, A., Luo, C., Tian, X.: A twofold Siamese network for real-time object tracking. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), New York, pp. 4834–4843. IEEE (2018)
Li, B., Yan, J., Wu, W.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 8971–8980. IEEE (2018)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 7132–7141. IEEE (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., (eds.) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol. 11211. Springer, Cham (2018)
Wang, Q., Wu, B., Zhu P.: ECA-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 11534 –11542. IEEE (2020)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 7794–7803. IEEE (2018)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV), New York, pp.1971–1980. IEEE (2019)
Howard, A.G., Zhu, M.: Chen, B.: MobileNets efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 770–778. IEEE (2016)
Li, P., Chen, B., Ouyang, W.: GradNet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision ICCV(ICCV), New York, pp. 6162–6171. IEEE (2019)
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking.In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 4591–4600. IEEE (2019)
Danelljan, M., Hager, G., Khan, F.S.: Learning spatially regularized correlation filters for visual tracking. In: 2015 IEEE International Conference on Computer Vision(ICCV), New York, pp. 4310–4318. IEEE (2015)
Bertinetto, L., Valmadre, J., Golodetz, S.: Staple: complementary learners for real-time tracking.In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, pp. 1401–1409. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yu, H., Liu, Q. (2021). Lightweight Object Tracking Algorithm Based on Siamese Network with Efficient Attention. In: Tan, Y., Shi, Y., Zomaya, A., Yan, H., Cai, J. (eds) Data Mining and Big Data. DMBD 2021. Communications in Computer and Information Science, vol 1453. Springer, Singapore. https://doi.org/10.1007/978-981-16-7476-1_41
Download citation
DOI: https://doi.org/10.1007/978-981-16-7476-1_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7475-4
Online ISBN: 978-981-16-7476-1
eBook Packages: Computer ScienceComputer Science (R0)