Single-target localization in video sequences using offline deep-ranked metric learning and online learned models updating

Huang, Wei; Zeng, **g; Zhang, Peng; Chen, Guang; Ding, Huijun

doi:10.1007/s11042-018-6042-1

Single-target localization in video sequences using offline deep-ranked metric learning and online learned models updating

Published: 02 May 2018

Volume 77, pages 28539–28565, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wei Huang¹,
**g Zeng¹,
Peng Zhang²,
Guang Chen³ &
…
Huijun Ding⁴

283 Accesses
4 Citations
Explore all metrics

Abstract

Foreground targets localization in video sequences receives much popularity in computer vision during the past few years, and its studies are highly related toward machine learning techniques. Driven by the recent popular deep learning techniques in machine learning, many contemporary localization studies are equipped with popular deep learning methods, and their performance has been benefited a lot by the prominent generalization capability of deep learning methods. In this study, inspired by deep metric learning, which is a new trend in deep learning, a novel single-target localization method is proposed. This new method is composed of two steps. First, an offline deep-ranked metric learning step is fulfilled and its gradient at the end-to-end learning procedure of the whole deep learning model is derived for realizing the conventional stochastic gradient algorithm. Also, an alternative proximal gradient algorithm is introduced to boost the efficiency as well. Second, an online models updating step is employed by the consecutive updating manner as well as the incremental updating manner, in order to make the offline learned outcome more adaptive during the progression of video sequences, in which challenging circumstances, such as sudden illumination changes, obstacles, shape transformation, complex background, etc., are likely to occur. This new single-target localization method has been compared with several shallow learning-based or deep learning-based localization methods in a large video database. Both qualitative and quantitative analysis have been comprehensively conducted to reveal the superiority of the new single-target localization method from the statistical point of view.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end deep metric network for visual tracking

Article 24 July 2019

Unsupervised Deep Representation Learning for Real-Time Tracking

Article 21 September 2020

Cross-scale content-based full Transformer network with Bayesian inference for object tracking

Article 22 November 2022

References

Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning, 1st edn. MIT Press, Cambridge
Book Google Scholar
Chen S, Guo C, Lai J (2016) Deep ranking for person re-identification via joint representation learning. IEEE Trans Image Process 25(5):2353–2367
Article MathSciNet Google Scholar
Chen Y, Li J, **ao H, ** X, Yan S, Feng J (2017) Dual path networks. ar**v:1707.01629
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition
Dong P, Wang W (2016) Better region proposals for pedestrian detection with R-CNN. In: IEEE international conference on visual communications and image processing
Ghahramani Z (2004) Unsupervised learning. Lect Notes Comput Sci 3176:72–112
Article Google Scholar
Girshick R, Donahue J, Darrelland T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38:142–158
Article Google Scholar
Hare S, Saari A, Torr P (2011) Struck: structured output tracking with kernels. In: IEEE international conference on computer vision, pp 263–270
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. ar**v:1512.03385
He K, Zhang X, Ren S, Sun J (2016) Identity map**s in deep residual networks. ar**v:1603.05027
Henriques J, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters: exploiting the circulant structure of tracking-by-detection with kernels. ar**v:1404.7584
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Article MathSciNet Google Scholar
Hu J, Lu J, Tan Y (2014) Discriminative deep metric learning for face verification in the wild. In: IEEE international conference on computer vision and pattern recognition
Hu J, Lu J, Tan Y, Zhou J (2016) Deep transfer metric learning. IEEE Trans Image Process 25(12):5576–5588
Article MathSciNet Google Scholar
Hu J, Lu J, Tan Y (2016) Deep metric learning for visual tracking. IEEE Trans Circuits Syst Video Technol 26(11):2056–2068
Article Google Scholar
Kalal K, Matas J (2010) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 6(1):1409–1422
Google Scholar
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE
Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimedia 17(11):1989–1999
Article Google Scholar
Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: International conference on pattern recognition, pp 898–901
Liu Y, Nie L, Han L, Zhang L, Rosenblum D (2015) Action2activity: recognizing complex activities from sensor data. In: International joint conference on arterial intelligence, pp 1617–1623
Liu H, Ma B, Qin L, Pang J, Zhang C, Huang Q (2015) Set-label modeling and deep metric learning on person re-identification. Neurocomputing 151:1283–1292
Article Google Scholar
Liu Y, Nie L, Liu L, Rosenblum D (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Article Google Scholar
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum D (2016) Recognizing complex activities by a probabilistic interval-based model. In: Proceedings of the association for the advancement of artificial intelligence
Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multi-manifold deep metric learning for image set classification. In: IEEE international conference on computer vision and pattern recognition
Monti F, Baroffio L, Bondi L, Tagliasacchi M, Tubaro S (2016) Deep convolutional neural networks for pedestrian detection. Image Commun 47:482–489
Google Scholar
Rice J (2007) Mathematical statistics and data analysis, 2nd edn. Duxbury Press, Pacific Grove
Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sabour S, Frosst N, Hinton G (2017) Dynamic routing between capsules. ar**v:1710.09829
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556
Soleimani A, Araabi B, Fouladi K (2016) Deep multi-task metric learning for offline signature verification. Pattern Recogn Lett 80:84–90
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. ar**v:1409.4842
Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
**e S, Girshick R, Dollar P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. ar**v:1611.05431
**ng E, Ng A, Jordan M, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512
Xu Y, Cui J, Zhao H, Zha H (2012) Tracking generic human motion via fusion of low- and high-dimensional approaches. In: British machine vision conference
Yang L, ** R (2006) Distance metric learning: a comprehensive survey. https://www.cs.cmu.edu/liuy/frame_survey_v2.pdf
Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831
Article Google Scholar
Yi D, Lei Z, Liao S, Li S (2014) Deep metric learning for person re-identification. In: International conference on pattern recognition
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38:13–58
Article Google Scholar
Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics 47(12):4014–4024
Article Google Scholar
Zagoruyko S, Komodakis N (2016) Wide residual networks. ar**v:1605.07146
Zhang P, Zhuo T, Huang W, Chen K, Kankanhalli M (2017) Online object tracking based on CNN with spatial-temporal saliency guided sampling. Neurocomputing 257:115–127
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge grants 61403182, 61363046, 61301194, and 61231016 approved by National Natural Science Foundation of China, the key youth grant 20171ACB21017 approved by Natural Science Foundation of Jiangxi Province, the Natural Science Foundation Grant 827/000088 of SZU, the Research Fund for the Doctoral Program of Higher Education of China 20126102120055, as well as the foundation grant from NWPU 3102014JSJ0014 for supporting this study.

Author information

Authors and Affiliations

School of Information Engineering, Nanchang University, Nanchang, China
Wei Huang & **g Zeng
School of Computer Science, Northwestern Polytechnical University, **’an, China
Peng Zhang
**an Communications Institute, **’an, China
Guang Chen
Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
Huijun Ding

Authors

Wei Huang
View author publications
You can also search for this author in PubMed Google Scholar
**g Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huijun Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huijun Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, W., Zeng, J., Zhang, P. et al. Single-target localization in video sequences using offline deep-ranked metric learning and online learned models updating. Multimed Tools Appl 77, 28539–28565 (2018). https://doi.org/10.1007/s11042-018-6042-1

Download citation

Received: 06 September 2017
Revised: 12 March 2018
Accepted: 20 April 2018
Published: 02 May 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s11042-018-6042-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single-target localization in video sequences using offline deep-ranked metric learning and online learned models updating

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end deep metric network for visual tracking

Unsupervised Deep Representation Learning for Real-Time Tracking

Cross-scale content-based full Transformer network with Bayesian inference for object tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Single-target localization in video sequences using offline deep-ranked metric learning and online learned models updating

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end deep metric network for visual tracking

Unsupervised Deep Representation Learning for Real-Time Tracking

Cross-scale content-based full Transformer network with Bayesian inference for object tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation