Abstract
Foreground targets localization in video sequences receives much popularity in computer vision during the past few years, and its studies are highly related toward machine learning techniques. Driven by the recent popular deep learning techniques in machine learning, many contemporary localization studies are equipped with popular deep learning methods, and their performance has been benefited a lot by the prominent generalization capability of deep learning methods. In this study, inspired by deep metric learning, which is a new trend in deep learning, a novel single-target localization method is proposed. This new method is composed of two steps. First, an offline deep-ranked metric learning step is fulfilled and its gradient at the end-to-end learning procedure of the whole deep learning model is derived for realizing the conventional stochastic gradient algorithm. Also, an alternative proximal gradient algorithm is introduced to boost the efficiency as well. Second, an online models updating step is employed by the consecutive updating manner as well as the incremental updating manner, in order to make the offline learned outcome more adaptive during the progression of video sequences, in which challenging circumstances, such as sudden illumination changes, obstacles, shape transformation, complex background, etc., are likely to occur. This new single-target localization method has been compared with several shallow learning-based or deep learning-based localization methods in a large video database. Both qualitative and quantitative analysis have been comprehensively conducted to reveal the superiority of the new single-target localization method from the statistical point of view.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig11_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig12_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6042-1/MediaObjects/11042_2018_6042_Fig13_HTML.gif)
Similar content being viewed by others
References
Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning, 1st edn. MIT Press, Cambridge
Chen S, Guo C, Lai J (2016) Deep ranking for person re-identification via joint representation learning. IEEE Trans Image Process 25(5):2353–2367
Chen Y, Li J, **ao H, ** X, Yan S, Feng J (2017) Dual path networks. ar**v:1707.01629
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition
Dong P, Wang W (2016) Better region proposals for pedestrian detection with R-CNN. In: IEEE international conference on visual communications and image processing
Ghahramani Z (2004) Unsupervised learning. Lect Notes Comput Sci 3176:72–112
Girshick R, Donahue J, Darrelland T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38:142–158
Hare S, Saari A, Torr P (2011) Struck: structured output tracking with kernels. In: IEEE international conference on computer vision, pp 263–270
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. ar**v:1512.03385
He K, Zhang X, Ren S, Sun J (2016) Identity map**s in deep residual networks. ar**v:1603.05027
Henriques J, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters: exploiting the circulant structure of tracking-by-detection with kernels. ar**v:1404.7584
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Hu J, Lu J, Tan Y (2014) Discriminative deep metric learning for face verification in the wild. In: IEEE international conference on computer vision and pattern recognition
Hu J, Lu J, Tan Y, Zhou J (2016) Deep transfer metric learning. IEEE Trans Image Process 25(12):5576–5588
Hu J, Lu J, Tan Y (2016) Deep metric learning for visual tracking. IEEE Trans Circuits Syst Video Technol 26(11):2056–2068
Kalal K, Matas J (2010) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 6(1):1409–1422
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE
Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimedia 17(11):1989–1999
Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: International conference on pattern recognition, pp 898–901
Liu Y, Nie L, Han L, Zhang L, Rosenblum D (2015) Action2activity: recognizing complex activities from sensor data. In: International joint conference on arterial intelligence, pp 1617–1623
Liu H, Ma B, Qin L, Pang J, Zhang C, Huang Q (2015) Set-label modeling and deep metric learning on person re-identification. Neurocomputing 151:1283–1292
Liu Y, Nie L, Liu L, Rosenblum D (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum D (2016) Recognizing complex activities by a probabilistic interval-based model. In: Proceedings of the association for the advancement of artificial intelligence
Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multi-manifold deep metric learning for image set classification. In: IEEE international conference on computer vision and pattern recognition
Monti F, Baroffio L, Bondi L, Tagliasacchi M, Tubaro S (2016) Deep convolutional neural networks for pedestrian detection. Image Commun 47:482–489
Rice J (2007) Mathematical statistics and data analysis, 2nd edn. Duxbury Press, Pacific Grove
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Sabour S, Frosst N, Hinton G (2017) Dynamic routing between capsules. ar**v:1710.09829
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556
Soleimani A, Araabi B, Fouladi K (2016) Deep multi-task metric learning for offline signature verification. Pattern Recogn Lett 80:84–90
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. ar**v:1409.4842
Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
**e S, Girshick R, Dollar P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. ar**v:1611.05431
**ng E, Ng A, Jordan M, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512
Xu Y, Cui J, Zhao H, Zha H (2012) Tracking generic human motion via fusion of low- and high-dimensional approaches. In: British machine vision conference
Yang L, ** R (2006) Distance metric learning: a comprehensive survey. https://www.cs.cmu.edu/liuy/frame_survey_v2.pdf
Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831
Yi D, Lei Z, Liao S, Li S (2014) Deep metric learning for person re-identification. In: International conference on pattern recognition
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38:13–58
Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics 47(12):4014–4024
Zagoruyko S, Komodakis N (2016) Wide residual networks. ar**v:1605.07146
Zhang P, Zhuo T, Huang W, Chen K, Kankanhalli M (2017) Online object tracking based on CNN with spatial-temporal saliency guided sampling. Neurocomputing 257:115–127
Acknowledgements
The authors would like to acknowledge grants 61403182, 61363046, 61301194, and 61231016 approved by National Natural Science Foundation of China, the key youth grant 20171ACB21017 approved by Natural Science Foundation of Jiangxi Province, the Natural Science Foundation Grant 827/000088 of SZU, the Research Fund for the Doctoral Program of Higher Education of China 20126102120055, as well as the foundation grant from NWPU 3102014JSJ0014 for supporting this study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, W., Zeng, J., Zhang, P. et al. Single-target localization in video sequences using offline deep-ranked metric learning and online learned models updating. Multimed Tools Appl 77, 28539–28565 (2018). https://doi.org/10.1007/s11042-018-6042-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6042-1