Abstract
Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10462-021-10061-9/MediaObjects/10462_2021_10061_Fig16_HTML.png)
Similar content being viewed by others
References
Aaron W, Alan F, Prasad T (2014) Using trajectory data to improve bayesian optimization for reinforcement learning. J Mach Learn Res 15(8):253–282
Abeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on machine learning, pp 1–8. Association for Computing Machinery
Adam C, Pieter A, Andrew YN (2009) Apprenticeship learning for helicopter control. Commun ACM 52(7):97–105
Agogino AK, Tumer K (2004) Unifying temporal and structural credit assignment problems. In Proceedings of the third international joint conference on autonomous agents and multiagent systems–vol 2, pp 980–987. IEEE Computer Society
Al WA, Yun ID (2019) Partial policy-based reinforcement learning for anatomical landmark localization in 3d medical images. IEEE Trans Med Image
Al WA, Yun Io, Lee KJ (2019) Reinforcement learning-based automatic diagnosis of acute appendicitis in abdominal ct. ar**v preprint ar**v:1909.00617
Alaniz S (2018) Deep reinforcement learning with model learning and monte carlo tree search in minecraft. In Conference on reinforcement learning and decision making
Amir A, Ozan O, Yuanwei L, Loic LF, Benjamin H, Ghislain V, Konstantinos K, Athanasios V, Ben G, Bernhard K et al (2019) Evaluating reinforcement learning agents for anatomical landmark detection. Med Image Anal 53:156–164
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1010–1019, 2016
Andersson O, Heintz F, Doherty P (2015) Model-based reinforcement learning in continuous environments using real-time constrained optimization. In AAAI
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) A brief survey of deep reinforcement learning. ar**v preprint ar**v:1708.05866
Avinash Ramakanth S, Venkatesh Babu R (2014) Seamseg: Video object segmentation using patch seams. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 376–383
Ayle M, Tekli J, El-Zini J, El-Asmar B, Awad M (2020) Bar-a reinforcement learning agent for bounding-box automated refinement
Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2016) GA3C: gpu-based A3C for deep reinforcement learning. arxiv:CoRR:abs/1611.06256
Babenko B, Yang M-H, Belongie S (2009) Visual tracking with online multiple instance learning. In 2009 IEEE conference on computer vision and pattern recognition, pp 983–990. IEEE
Bae S-H, Yoon K-J (2014) Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1218–1225
Bagnell J (2012) Learning decision: Robustness, uncertainty, and approximation. 04
Bagnell JA, Schneider JG (2001) Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings 2001 ICRA. IEEE international conference on robotics and automation (Cat. No.01CH37164), vol 2, pp 1615–1620
Barron JT (2019) A general and adaptive robust loss function. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4331–4339
Bellver M, Giró-i Nieto X, Marqués F, Torres J (2016) Hierarchical object detection with deep reinforcement learning. ar**v preprint ar**v:1611.03718
Bergmann P, Fauser M, Sattlegger D, Steger C (2019) Mvtec ad a comprehensive real-world dataset for unsupervised anomaly detection. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9584–9592
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr Philip HS (2016) Fully-convolutional siamese networks for object tracking. In European conference on computer vision, pp 850–865. Springer
Black MJ, Yacoob Y (1995) Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In Proceedings of IEEE international conference on computer vision, pp 374–381. IEEE
Bloch N, Madabhushi A, Huisman H, Freymann J, Kirby J, Grauer M, Enquobahrie A, Jaffe C, Clarke L, Farahani K (2013) challenge: automated segmentation of prostate structures. Cancer Imag Arch 370:2015
Boedecker J, Springenberg JT, Wlfing J, Riedmiller M (2014) Approximate real-time optimal control based on sparse gaussian process models. In 2014 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp 1–8
Brazil G, Liu X (2019) M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE international conference on computer vision, Seoul, South Korea,
Bredell G, Tanner C, Konukoglu E (2018) Iterative interaction training for segmentation editing networks. In International workshop on machine learning in medical imaging, pp 363–370. Springer
Buetti-Dinh A, Galli V, Bellenberg S, Ilie O, Herold M, Christel S, Boretska M, Pivkin Igor V, Wilmes P, Sand W, Vera M, Dopson M (2019) Deep neural networks outperform human experts capacity in characterizing bioleaching bacterial biofilm composition. Biotechnol Rep 22:e00321
Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern C 38(2):156–172
Caelles S, Maninis K-K, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) One-shot video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230
Caicedo JC, Lazebnik S (2015) Active object localization with deep reinforcement learning. In Proceedings of the IEEE international conference on computer vision, pp 2488–2496
Carrera D, Manganini F, Boracchi G, Lanzarone E (2017) Defect detection in sem images of nanofibrous materials. IEEE Trans Ind Inf 13(2):551–561
Carsten R, Vladimir K, Andrew B (2004) ‘grabcut’ interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314
Chen B, Wang D, Li P, Wang S, Lu H (2018) Real-time’actor-critic’tracking. In Proceedings of the European conference on computer vision (ECCV), pp 318–334
Cheng J, Tsai Y-H, Wang S, Yang M-H (2017) Segflow: Joint learning for video object segmentation and optical flow. In Proceedings of the IEEE international conference on computer vision, pp 686–695
Cher B, Pyry H, Vincenzo DP, Claudia C, Anthony BA (2017) Detection of axonal synapses in 3d two-photon images. PLoS ONE 12(9):e0183309
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. ar**v preprint ar**v:1406.1078
Cho K, van Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:CoRR:abs/1406.1078
Choi J, ** Chang H, Yun S, Fischer T, Demiris Y, Young Choi J (2017) Attentional correlation filter network for adaptive visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4807–4816
Choi W (2015) Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE international conference on computer vision, pp 3029–3037
Chorowski J, Bahdanau D, Serdyuk D, Cho KH, Bengio Y (2015) Attention-based models for speech recognition. arxiv:CoRR:abs/1506.07503
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In Proceedings of the IEEE international conference on computer vision, pp 4836–4845
Chu W-H, Kitani KM (2020) Neural batch sampling with reinforcement learning for semi-supervised anomaly detection. In European conference on computer vision, pp 751–766
Chu W-S, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual co-occurrence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3584–3592
Clavera I, Rothfuss J, Schulman J, Fujita Y, Asfour T, Abbeel P (2018) Model-based reinforcement learning via meta-policy optimization. arxiv:CoRR:abs/1809.05214
Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. In Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No. PR00662), vol 2, pp 142–149. IEEE
Concetto S, Simone P, Daniela G (2016) Gamifying video object segmentation. IEEE Trans Pattern Anal Mach Intell 39(10):1942–1958
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Coulom R (2006) Efficient selectivity and backup operators in monte-carlo tree search. In Proceedings of the 5th international conference on computers and games, pp 72–83
Coumans E, Bai Y (2016) Pybullet, a python module for physics simulation for games, robotics and machine learning
Craig Jordan V (1990) Long-term adjuvant tamoxifen therapy for breast cancer. Breast Cancer Res Treat 15(3):125–136
Criminisi A, Shotton J, Robertson D, Konukoglu E (2010) Regression forests for efficient anatomy detection and localization in ct studies. In International MICCAI workshop on medical computer vision, pp 106–117. Springer
Dai T, Dubois M, Arulkumaran K, Campbell J, Bass C, Billot B, Uslu F, De Paola V, Clopath C, Bharath AA (2019) Deep reinforcement learning for subpixel neural tracking. In International conference on medical imaging with deep learning, pp 130–150
Danelljan Martin, Bhat Goutam, Shahbaz Khan Fahad, Felsberg Michael (2017) Eco: efficient convolution operators for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Darryl MC, Andrew M, Adnan T, Dominic K, Stuart C (2014) Fully automatic lesion segmentation in breast mri using mean-shift and graph-cuts on a region adjacency graph. J Magn Reson Imaging 39(4):795–804
David S, Guy L, Heess N, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms, Thomas Degris
Deisenroth MP, Englert P, Peters J, Fox D (2014) Multi-task policy search for robotics. In 2014 IEEE international conference on robotics and automation (ICRA), pp 3876–3881
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. IEEE
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699
Denzler J, Paulus DWR (1994) Active motion detection and object tracking. In Proceedings of 1st international conference on image processing, vol 3, pp 635–639. IEEE
Depraetere B, Liu M, Pinte G, Grondman I, Babuka R (2014) Comparison of model-free and model-based methods for time optimal hit control of a badminton robot. Mechatronics 24(8):1021–1030
De Asis K, Hernandez-Garcia JF, Holland GZ, Sutton RS (2018) Multi-step reinforcement learning: a unifying algorithm. In Thirty-Second AAAI conference on artificial intelligence
DiPietro R, Lea C, Malpani A, Ahmidi N, Vedula SS, Lee GI, Lee MR, Hager GD (2016) Recognizing surgical activities with recurrent neural networks. In International conference on medical image computing and computer-assisted intervention, pp 551–558. Springer
Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In 2009 IEEE conference on computer vision and pattern recognition, pp 304–311. IEEE
Dominik N, Saša G, Matthias J, Nassir N, Joachim H, Razvan I (2014) Probabilistic sparse matching for robust 3d/3d fusion in minimally invasive surgery. IEEE Trans Med Imaging 34(1):49–60
Don M, Anup B (1994) Motion tracking with an active camera. IEEE Trans Pattern Anal Mach Intell 16(5):449–459
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2014) Long-term recurrent convolutional networks for visual recognition and description. arxiv:CoRR:abs/1411.4389
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp 2758–2766
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) Carla: an open urban driving simulator. ar**v preprint ar**v:1711.03938
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Dulac-Arnold G, Levine N, Mankowitz DJ, Li J, Paduraru C, Gowal S, Hester T (2020) An empirical investigation of the challenges of real-world reinforcement learning
Dunnhofer M, Martinel N, Luca Foresti G, Micheloni C (2019) Visual tracking by means of deep reinforcement learning and an expert demonstrator. In Proceedings of the IEEE international conference on computer vision workshops
Duong CN, Quach KG, Jalata I, Le N, Luu K (2019) Mobiface: a lightweight deep learning face recognition on mobile devices. In 2019 IEEE 10th international conference on biometrics theory, applications and systems (BTAS), pp 1–6. IEEE
Duong CN, Quach KG, Luu K, Hoang LT, Savvides M, Bui TD (2019) Learning from longitudinal face demonstration–where tractable deep modeling meets inverse reinforcement learning. 127(6–7)
Eddy I, Nikolaus M, Tonmoy S, Margret K, Alexey D, Thomas B (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2462–2470
El-Fakdi A, Carreras M (2008) Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In 2008 IEEE/RSJ international conference on intelligent robots and systems, pp 3635–3640
Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: Sparse modeling for finding representative objects. In 2012 IEEE conference on computer vision and pattern recognition, pp 1600–1607. IEEE
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154
Everingham M Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The pascal visual object classes challenge 2007 (voc2007) results
Everingham M, Winn J (2011) The pascal visual object classes challenge 2012 (voc2012) development kit. Pattern analysis, statistical modelling and computational learning, Tech. Rep, 8, 2011
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5374–5383
Fan H, Ling H (2017) Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In Proceedings of the IEEE international conference on computer vision, pp 5486–5494
Felix H, Antoine B, Sumit C, Jason W (2015) The goldilocks principle: Reading children’s books with explicit memory representations. arxiv:CoRR:abs/1511.02301
Finn C, Tan XY, Duan Y, Darrell T, Levine S, Abbeel P (2016) Deep spatial autoencoders for visuomotor learning. In: Kragic D, Bicchi A, De Luca A (eds) 2016 IEEE international conference on robotics and automation, ICRA 2016. Stockholm, Sweden, pp 512–519
Florin-Cristian G, Bogdan G, Yefeng Z, Sasa G, Andreas M, Joachim H, Dorin C (2017) Multi-scale deep reinforcement learning for real-time 3d-landmark detection in ct scans. IEEE Trans Pattern Anal Mach Intell 41(1):176–189
FlorinC G, Edward K, Bogdan G, Vivek S, Yefeng Z, Joachim H, Dorin C (2016) Marginal space deep learning: efficient architecture for volumetric image parsing. IEEE Trans Med Imaging 35(5):1217–1228
Fontes DASE, Brandão LAP, da Antonio L Jr, de Albuquerque Araújo A, (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Ecogn Lett 32(1):56–68
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. ar**v preprint ar**v:1811.12560
Ganin Y, Kulkarni T, Babuschkin I, Eslami SM, Vinyals O (2018) Synthesizing programs for images using reinforced adversarial learning. ar**v preprint ar**v:1804.01118
Gao H, Zhuang L, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Gao M, Yu R, Li A, Morariu VI, Davis LS (2018) Dynamic zoom-in network for fast object detection in large images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6926–6935
Gao Y, Vedula SS, Reiley CE, Ahmidi N, Varadarajan B, Lin HC, Tao L, Zappella L, Béjar B, Yuh DD, et al (2014) Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In Miccai workshop: M2cai, vol 3, pp 3, 2014
Gauriau R, Cuingnet R, Lesage D, Bloch I (2014) Multi-organ localization combining global-to-local regression and confidence maps. In International conference on medical image computing and computer-assisted intervention, pp 337–344. Springer
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pp 3354–3361
Giles M (2017) Mit technology review. Google researchers have reportedly achieved ‘quantum supremacy’. http://www.technologyreview.com/f, 614416
Girshick R (2015) Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with r* cnn. In Proceedings of the IEEE international conference on computer vision, pp 1080–1088
Gl M, Chen J, Barron JT, Hasinoff Samuel W, Durand F (2017) Deep bilateral learning for real-time image enhancement. ACM Trans Graph 36(4):1–12
Goel V, Weng J, Poupart P (2018) Unsupervised video object segmentation for deep reinforcement learning. In Advances in neural information processing systems, pp 5683–5694
Gonzalez-Garcia A, Vezhnevets A, Ferrari V (2015) An active search strategy for efficient object class detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3022–3031
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In Advances in neural information processing systems, pp 2672–2680
Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. arxiv:CoRR:abs/1303.5778
Gubern-Mérida A, Martí R, Melendez J, Hauth JL, Mann RM, Karssemeijer N, Platel B (2015) Automated localization of breast cancer in dce-mri. Med Image Anal 20(1):265–274
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V (2017) and Aaron C Courville. Improved training of wasserstein gans. In Advances in neural information processing systems, pp 5767–5777
Guo M, Lu J, Zhou J (2018) Dual-agent deep reinforcement learning for deformable face tracking. In Proceedings of the European conference on computer vision (ECCV), pp 768–783
Gupta A, Mendonca R, Liu YX, Abbeel P, Levine S (2018) Meta-reinforcement learning of structured exploration strategies. In Advances in neural information processing systems, pp 5302–5311
Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from rgb-d images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 564–571
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In European conference on computer vision, pp 345–360. Springer
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In European conference on computer vision, pp 505–520. Springer
Hamid Rezatofighi S, Milan A, Zhang Z, Shi Q, Dick A, Reid I (2015) Joint probabilistic data association revisited. In Proceedings of the IEEE international conference on computer vision, pp 3047–3055
Han J, Yang L, Zhang D, Chang X, Liang X (2018) Reinforcement cutting-agent learning for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9080–9089
Hang X, Hanchuan P (2013) App2: automatic tracing of 3d neuron morphology based on hierarchical pruning of a gray-weighted image distance-tree. Bioinformatics 29(11):1448–1454
Haralick RM, Shapiro LG (1985) Image segmentation techniques. Computer Vision, Graphics, and Image Processing 29(1):100–132
Hare S, Golodetz S, Saffari A, Vineet V, Cheng M-M, Hicks SL, Torr PHS (2015) Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109
Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In European conference on computer vision, pp 297–312. Springer
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
Haroon I, Imran S, Cody S, Mubarak S (2013) Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554
Haroon I, Muhmmad T, Kishan A, Dong Z, Somaya A-M, Nasir R, Mubarak S (2018) Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European conference on computer vision (ECCV), pp 532–546
Hase H, Azampour MF, Tirindelli M, Paschali M, Simson W, Fatemizadeh E, Navab N (2020) Ultrasound-guided robotic navigation with deep reinforcement learning. ar**v preprint ar**v:2003.13321
Hasselt HV (2010) Double q-learning. In Advances in neural information processing systems, pp 2613–2621
Hausknecht MJ, Stone P (2015) Deep recurrent q-learning for partially observable mdps. arxiv:CoRR:abs/1507.06527
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Hernandez-Leal P, Kaisers M, Baarslag T, de Cote EM (2017) A survey of learning in multiagent environments: Dealing with non-stationarity. arxiv:CoRR:abs/1707.09183
Le Hoang NT, Duong CN, Han L, Luu K, Quach KG, Savvides M (2018) Deep contextual recurrent residual networks for scene labeling. Pattern Recogn 80:32–41
Le Hoang NT, Quach KG, Luu K, Duong CN, Savvides M (2018) Reformulating level sets as deep recurrent neural network approach to semantic segmentation. IEEE Trans Image Process 27(5):2393–2407
Hoiem D, Efros AA, Hebert M (2007) Recovering surface layout from an image. Int J Comput Vision 75(1):151–172
Holliday JB, Le Ngan TH (2020) Follow then forage exploration: improving asynchronous advantage actor critic. In International conference on soft computing, artificial intelligence and applications (SAI 2020), pp 107–118
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 7132–7141
Hu JF, Zheng WS, Lai J, Zhang J (2015) Jointly learning heterogeneous features for rgb-d activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5344–5352
Huang L, Zhao X, Huang K (2019) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell
Humpire-Mamani GE, Setio Arnaud AA, van Ginneken B, Jacobs C (2018) Efficient organ localization using multi-label convolutional neural networks in thorax-abdomen ct scans. Phys Med Biol 63(8):085003
Ibanez L, Schroeder W, Ng L, Cates J (2005) The itk software guide: updated for itk version 2:4
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. ar**v preprint ar**v:1502.03167
Jack CR Jr, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, Borowski B, Britson PJ, Whitwell JL, Ward C et al (2008) The alzheimerâ??s disease neuroimaging initiative (adni): Mri methods. J Magn Reson Imag 27(4):685–691
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Advances in neural information processing systems 2017–2025
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In European conference on computer vision, pp 512–528. Springe
Jain A, Powers A, Johnson HJ (2020) Robust automatic multiple landmark detection. In 2020 IEEE 17th international symposium on biomedical imaging (ISBI), pp 1178–1182. IEEE
Jain SD, Grauman K (2014) Supervoxel-consistent foreground propagation in video. In European conference on computer vision, pp 656–671. Springer
Jain SD, **ong B, Grauman K (2017) Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2126. IEEE
Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 451–461
Jan P, Stefan S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697
Jang WD, Kim C-S (2017) Online video object segmentation via convolutional trident network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5849–5858
Jens Kober J, Bagnell A, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Jia Z, Yang L, Szepesvari C, Wang M (2020) Model-based reinforcement learning with value-targeted regression. In Proceedings of the 2nd conference on learning for dynamics and control, volume 120 of proceedings of machine learning research, pp 666–686, The Cloud
Jialue F, Wei X, Ying W, Yihong G (2010) Human tracking using convolutional neural networks. IEEE Trans Neural Netw 21(10):1610–1623
Jiang M, Deng C, Pan Z, Wang L, Sun X (2018) Multiobject tracking in videos based on lstm and deep reinforcement learning. Complexity
Jie Z, Liang X, Feng J, ** X, Lu W, Yan S (2016) Tree-structured reinforcement learning for sequential object localization. In Advances in neural information processing systems, pp 127–135
**won A, Sungzoon C (2015) Variational autoencoder based anomaly detection using reconstruction probability. Spec Lect IE 2(1):1–18
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016
Jun Koh Y, Kim C-S (2017) Primary object segmentation in videos based on region augmentation and reduction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3442–3450
Justin G, Reza EM (2015) Concurrent markov decision processes for robot team learning. Eng Appl Artif Intell 39:223–234
Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. Association for Computational Linguistics
Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE conference on computational intelligence and games (CIG), pp 1–8. IEEE
Keni B, Rainer S (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 2008:1–10
Kim KK, Cho SH, Kim HJ, Lee JY (2005) Detecting and tracking moving object using an active camera. In The 7th international conference on advanced communication technology, 2005, ICACT 2005, vol 2, pp 817–820. IEEE
Kirwan D (2010) Nhs fetal anomaly screening programme. National Standards and Guidance for England 18
Klein S, Staring M, Murphy K, Viergever MA, Pluim JPW (2009) Elastix: a toolbox for intensity-based medical image registration. IEEE Trans Med Imaging 29(1):196–205
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In Advances in neural information processing systems, pp 1008–1014
Krebs J, Mansi T, Delingette H, Zhang L, Ghesu FC, Miao S, Maier AK, Ayache N, Rui L, Ali K (2017) Robust non-rigid registration through agent-based action learning. In International conference on medical image computing and computer-assisted intervention, pp 344–352. Springer
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A, et al (2018) The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision (ECCV)
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2015) The visual object tracking vot2015 challenge results. In Proceedings of the IEEE international conference on computer vision workshops, pp 1–23
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp 1097–1105
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in neural information processing systems, pp 109–117
Kupcsik A, Deisenroth MP, Peters J, Loh AP, Vadakkepat P, Neumann G (2017) Model-based contextual policy search for data-efficient generalization of robot skills. Artif Intell 247:415–439
Kupcsik A, Deisenroth M, Peters J, Neumann G (2013) Data-efficient generalization of robot skills with contextual policy search. In AAAI
Kurutach T, Clavera I, Duan Y, Tamar A, Abbeel P (2018) Model-ensemble trust-region policy optimization
Le N, Le T, Yamazaki K, Bui TD, Luu K, Savides M (2020) Offset curves loss for imbalanced problem in medical segmentation. ar**v preprint ar**v:2012.02463
Le N, Yamazaki K, Truong D, Quach KG, Savvides M (2020) A multi-task contextual atrous residual network for brain tumor detection & segmentation. ar**v preprint ar**v:2012.02073
LeCun Y (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist
LeCun Y, Bottou L, Orr GB, Müller K-R (1998) Efficient backprop. In Neural networks: Tricks of the trade, pp 9–50. Springer
LeCun Y, Touresky D, Hinton G, Sejnowski T (1988) A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, pp 21–28. CMU, Pittsburgh, Pa: Morgan Kaufmann
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In proceedings of the IEEE conference on computer vision and pattern recognition, pp 156–165
Lea C, Reiter A, Vidal R, Hager GD (2016) Segmental spatiotemporal cnns for fine-grained action segmentation. In European conference on computer vision, pp 36–52. Springer
Lea C, Vidal R, Hager GD (2016) Learning convolutional action primitives for fine-grained action recognition. In 2016 IEEE international conference on robotics and automation (ICRA), pp 1642–1649. IEEE
Lea C, Vidal R, Reiter A, Hager GD (2016) Temporal convolutional networks: a unified approach to action segmentation. In European conference on computer vision, pp 47–54. Springer
Leal-Taixé L, Canton-Ferrer C, Schindler K (2016) Learning by tracking: Siamese cnn for robust target association. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 33–40
Leal-Taixé L, Fenzi M, Kuznetsova A, Rosenhahn B, Savarese S (2014) Learning an image-based motion context for multiple people tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3542–3549
Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: towards a benchmark for multi-target tracking. ar**v preprint ar**v:1504.01942
Lee JW, Park J, Jangmin O, Lee J, Hong E (2007) A multiagent approach to q-learning for daily stock trading. Trans Syst Man Cyber Part A 37(6):864–877
Lee H, Kim HE, Nam H (2019) Srm: a style-based recalibration module for convolutional neural networks. pp 1854–1862
Leibo JZ, Zambaldi VF, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. arxiv:CoRR:abs/1702.03037
Leo B (1996) Bagging predictors. Mach Learn 24(2):123–140
Leo G (2006) Random walks for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(11):1768–1783
Levine S, Koltun V (2014) Learning complex neural network policies with trajectory optimization. In Proceedings of the 31st international conference on machine learning, pp 829–837
Li B, Yan J, Wu W, Zhu Z, **aolin H (2018) High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980
Li B, Ouyang W, Sheng L, Zeng X, Wang X (2019) GS3D: an efficient 3d object detection framework for autonomous driving. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp 1019–1028. Computer vision foundation/IEEE
Li C, Zhong Q, **e D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. ar**v preprint ar**v:1804.06055
Li D, Chen Q (2020) Deep reinforced attention learning for quality-aware visual recognition. In European conference on computer vision, pp 493–509
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. ar**v preprint ar**v:1503.08663
Li J, Luong MT, Jurafsky D (2015) A hierarchical neural autoencoder for paragraphs and documents. arxiv:CoRR:abs/1506.01057
Li K, Rath M, Burdick JW (2018) Inverse reinforcement learning via function approximation for clinical motion analysis. In 2018 IEEE international conference on robotics and automation (ICRA), pp 610–617
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pp 661–670
Li Y, Merialdo B (2010) Multi-video summarization based on video-mmr. In 11th International workshop on image analysis for multimedia interactive services WIAMIS 10, pp 1–4. IEEE
Li Y, Alansary A, Cerrolaza JJ, Khanal B, Sinclair M, Matthew J, Gupta C, Knight C, Kainz B, Rueckert D (2018) Fast multiple landmark localisation using a patch-based iterative network. In International conference on medical image computing and computer-assisted intervention, pp 563–571. Springer
Liang-Chieh C, George P, Iasonas K, Kevin M, AlanL Y (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Liao R, Miao S, de Tournemire P, Grbic S, Kamen A, Mansi T, Comaniciu D (2017) An artificial agent for robust image registration. In Thirty-First AAAI conference on artificial intelligence
Liao X, Li W, Xu Q, Wang X, ** B, Zhang X, Wang Y, Zhang Y (2020) Iteratively-refined interactive 3d medical image segmentation with multi-agent reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9394–9402
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. ar**v e-prints ar**v:1509.02971
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. ar**v preprint ar**v:1509.02971
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Lindeberg T (2013) Scale-space theory in computer vision, volume 256. Springer Science & Business Media
Litjens G, Toth R, van de Ven W, Hoeks C, Kerkstra S, van Ginneken B, Vincent G, Guillard G, Birbeck N, Zhang J et al (2014) Evaluation of prostate segmentation algorithms for MRI: the promise12 challenge. Med Image Anal 18(2):359–373
Liu D, Jiang T (2018) Deep reinforcement learning for surgical gesture segmentation and classification. In International conference on medical image computing and computer-assisted intervention, pp 247–255. Springer
Liu H, Socher R, **ong C (2019)Taming maml: efficient unbiased meta-reinforcement learning. In International conference on machine learning, pp 4061–4071
Liu L, Hao L, Zou H, **ong H, Cao Z, Shen C (2020) Sequential crowd counting by reinforcement learning, Weighing counts
Liu L, Wu C, Lu J, **e L, Zhou J, Tian Q (2020) Reinforced axial refinement network for monocular 3d object detection. In European conference on computer vision ECCV, pp 540–556
Liu L, Wang H, Li G, Ouyang W, Lin L (2018) Crowd counting using deep recurrent spatial-aware network. ar**v preprint ar**v:1807.00601
Liu T, Meng Q, Vlontzos A, Tan J, Rueckert D, Kainz B (2020) Ultrasound video summarization using deep reinforcement learning. ar**v preprint ar**v:2005.09531
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5099–5108
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lorenzi M, Ayache N, Frisoni GB, Pennec X (2013) Alzheimerâ??s Disease Neuroimaging Initiative (ADNI), Lcc-demons: a robust and accurate symmetric diffeomorphic registration algorithm. Neuroimage 81:470–483
Lotfi T, Tang L, Andrews S, Hamarneh G (2013) Improving probabilistic image registration via reinforcement learning and uncertainty evaluation. In International workshop on machine learning in medical imaging, pp 187–194. Springer
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2017) End-to-end active object tracking via reinforcement learning. ar**v preprint ar**v:1705.10561
Luong T, Sutskever I, Le QV, Vinyals O, Zaremba W (2014) Addressing the rare word problem in neural machine translation. arxiv:CoRR:abs/1410.8206
Luu K, Zhu C, Bhagavatula C, Ngan Le TH, Savvides M (2016) A deep learning approach to joint face detection and segmentation. In Advances in face detection and facial image analysis, pp 1–12. Springer
Ma C, Huang JB, Yang X, Yang M-H (2015) Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE international conference on computer vision, pp 3074–3082
Ma K, Wang J, Singh V, Tamersoy B, Chang YJ, Wimmer A, Chen T (2017) Multimodal image registration with deep context reinforcement learning. In International conference on medical image computing and computer-assisted intervention, pp 240–248. Springer
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 202–211
Maicas G, Carneiro G, Bradley AP, Nascimento JC, Reid I (2017) Deep reinforcement learning for active breast lesion detection from dce-mri. In International conference on medical image computing and computer-assisted intervention, pp 665–673. Springer
Mao J, Xu W, Yang Y, Wang J, Yuille AL (2014) Deep captioning with multimodal recurrent neural networks (m-rnn). arxiv:CoRR:abs/1412.6632
Martinez-Marin T, Duckett T (2005) Fast reinforcement learning for vision-guided mobile robots. In Proceedings of the 2005 IEEE international conference on robotics and automation, pp 4170–4175
de Marvao A, Dawes-Timothy JW, Shi W, Minas C, Keenan NG, Diamond T, Durighel G, Montana G, Rueckert D, Cook SA et al (2014) Population-based studies of myocardial hypertrophy: high resolution cardiovascular magnetic resonance atlases improve statistical power. J Cardiovasc Magn Reson 16(1):16
Massimiliano P, Angelo C (2017) Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recogn 71:132–143
Matas J, James S, Davison v (2018) Sim-to-real reinforcement learning for deformable object manipulation. ar**v preprint ar**v:1806.07851
Mathe S, Pirinen v, Sminchisescu C (2016) Reinforcement learning for visual object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2894–2902
Matsopoulos GK, Mouravliansky NA, Delibasis KK, Nikita KS (1999) Automatic retinal image registration scheme using global optimization techniques. IEEE Trans Inf Technol Biomed 3(1):47–60
Matteo H, Joseph M, Hado Van H, Tom S, Georg O, Will D, Dan H, Bilal P, Mohammad A, David S (2017) Rainbow: combining improvements in deep reinforcement learning. ar**v preprint ar**v:1710.02298
Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, Burren Y, Porz N, Slotboom J, Wiest R et al (2014) The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans Med Imag 34(10):1993–2024
Miao S, Wang ZJ, Liao R (2016) A cnn regression approach for real-time 2d/3d registration. IEEE Trans Med Imag 35(5):1352–1363
Miao S, Liao R, Pfister M, Zhang L, Ordy V (2013) System and method for 3-d/3-d registration between non-contrast-enhanced cbct and contrast-enhanced ct for abdominal aortic aneurysm stenting. In International conference on medical image computing and computer-assisted intervention, pp 380–387. Springer
Michael FJ, West Jay B (2001) The distribution of target registration error in rigid-body point-based registration. IEEE Trans Med Imaging 20(9):917–927
Mikolov T, Kombrink S, Burget L, Cernocký J, Khudanpur S (2011) Extensions of recurrent neural network language model. In ICASSP, pp 5528–5531
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) Mot16: a benchmark for multi-object tracking. ar**v preprint ar**v:1603.00831
Milan A, Leal-Taixé L, Schindler K, Reid I (2015) Joint tracking and segmentation of multiple targets. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5397–5406
Milan A, Rezatofighi SH, Dick A, Reid I, Schindler K (2017) Online multi-target tracking using recurrent neural networks. In Thirty-First AAAI conference on artificial intelligence
Minaee S, Abdolrashidi A, Su H, Bennamoun M, Zhang D (2019) Biometric recognition using deep learning: a survey. arxiv:CoRR:abs/1912.00271
Ming-Ming C, Mitra Niloy J, **aolei H, Torr Philip HS, Shi-Min H (2014) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582
Mingxin J, Tao H, Zhigeng P, Haiyan W, Yinjie J, Chao D (2019) Multi-agent deep reinforcement learning for multi-object tracker. IEEE Access 7:32400–32407
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp 928–1937
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning, pp 1928–1937
Mordatch I , Mishra N, Eppner C, Abbeel P (2016) Combining model-based policy search with online model learning for control of physical humanoids. In 2016 IEEE international conference on robotics and automation (ICRA), pp 242–248
Morimoto J, Zeglin G, Atkeson CG (2003) Minimax differential dynamic programming: application to a biped walking robot. In Proceedings 2003 IEEE/RSJ international conference on intelligent robots and systems (IROS 2003) (Cat. No.03CH37453), vol 2, pp 1927–1932
Morimoto J, Atkeson CG (2009) Nonparametric representation of an approximated poincaré map for learning biped locomotion. In Autonomous robots, pp 131–144
Mousavian A, Anguelov D, Flynn J, Košecká J (2017) 3D bounding box estimation using deep learning and geometry. In 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5632–5640
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for UAV tracking. In European conference on computer vision, pp 445–461. Springer
Märki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 743–751
Nagabandi A, Clavera I, Liu S, Fearing RS, Abbeel P, Levine S, Finn C (2018) Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. ar**v preprint ar**v:1803.11347
Nair A, McGrew B, Andrychowicz M, Zaremba W, Abbeel P (2018) Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pp 6292–6299
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Narges A, Lingling T, Shahin S, Yixin G, Colin L, Bejar HB, Luca Z, Sanjeev K, René V, Hager Gregory D (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64(9):2025–2041
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165
Navarro F, Sekuboyina A, Waldmannstetter D, Peeken JC, Combs SE, Menze BH (2020) Deep reinforcement learning for organ localization in ct. ar**v preprint ar**v:2005.04974
Neil B, Nicholas HA, Darcie Thomas E (2008) Minimal-bracketing sets for high-dynamic-range image capture. IEEE Trans Image Process 17(10):1864–1875
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In Proceedings of the seventeenth international conference on machine learning, ICML ’00, pp 663–670, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc
Ng AY, Russell SJ, et al (2000) Algorithms for inverse reinforcement learning. In Icml, vol 1
Nguyen TT, Li Z, Silander T, Leong T-Y (2013) Online feature selection for model-based reinforcement learning. In Proceedings of the 30th international conference on international conference on machine learning–vol 28, pp I–498–I–506
Nhan Duong C, Quach KG, Luu K, Le N, Savvides M (2017) Temporal non-volume preserving approach to facial age-progression and age-invariant face recognition. In Proceedings of the IEEE international conference on computer vision, pp 3735–3743
Nicolas S, Kenji D (2003) Meta-learning in reinforcement learning. Neural Netw 16(1):5–9
Niedzwiedz C, Elhanany I, Liu Z, Livingston S (2008) A consolidated actor-critic model with function approximation for high-dimensional pomdps. In AAAI 2008 workshop for advancement in POMDP solvers
Ning Y, He S, Zhiyong W, **ng C, Zhang L-J (2019) A review of deep learning based speech synthesis. Appl Sci 9(19)
Noam B, Tuomas S (2019) Superhuman ai for multiplayer poker. Science 365(6456):885–890
Okuma K, Taleghani A, De Freitas N, Little JJ, Lowe DG (2004) A boosted particle filter: multitarget detection and tracking. In European conference on computer vision, pp 28–39. Springer
Olga R, Jia D, Hao S, Jonathan K, Sanjeev S, Sean M, Zhiheng H, Andrej K, Aditya K, Michael B et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Orlando JI, Fu H, Breda JB, van Keer K, Bathula DR, Diaz-Pinto A, Fang R, Heng P-A, Kim J, Lee JH, et al (2020) Refuge challenge: a unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Med Image Anal 59:101570
Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J (2018)
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In Proceedings of the IEEE international conference on computer vision, pp 1777–1784
Paschalidis IC, Li K, Moazzez Estan**i R (2009) An actor-critic method using least squares temporal difference learning. In Proceedings of the 48h IEEE conference on decision and control (CDC) held jointly with 2009 28th Chinese Control Conference, pp 2564–2569
Peixia L, Dong W, Lijun W, Huchuan L (2018) Deep visual tracking: Review and experimental comparison. Pattern Recogn 76:323–338
Peng H, Ruan Z, Long F, Simpson JH, Myers EW (2010) V3d enables real-time 3d visualization and quantitative analysis of large-scale biological image data sets. Nat Biotechnol 28(4):348–353
Pengpeng L, Erik B, Haibin L (2015) Encoding color information for visual tracking: Algorithms and benchmark. IEEE Trans Image Process 24(12):5630–5644
Pengyu Z, Dong W, Lu H (2020) Review and experimental comparison, Multi-modal visual tracking
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 724–732
Philippe T, Michael U (2000) Optimization of mutual information for multiresolution image registration. IEEE Trans Image Process 9(12):2083–2099
Pieter A, Adam C, Andrew YN (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robot Res 29(13):1608–1639
Pirinen A, Sminchisescu C (2018) Deep reinforcement learning of region proposal networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6945–6954
Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In CVPR 2011, pp 1201–1208. IEEE
Plaat A, Kosters W, Preuss M (2020) Deep model-based reinforcement learning for high-dimensional problems, a survey
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In European conference on computer vision, pp 540–555. Springer
Pourreza-Shahri R, Kehtarnavaz N (2015) Exposure bracketing via automatic exposure selection. In 2015 IEEE international conference on image processing (ICIP), pp 320–323. IEEE
Prest A, Leistner C, Civera J, Schmid C, Ferrari V (2012) Learning object class detectors from weakly annotated video. In 2012 IEEE Conference on computer vision and pattern recognition, pp 3282–3289. IEEE
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang M-H (2016) Hedged deep tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4303–4311
Rakelly K, Zhou A, Finn C, Levine S, Quillen D (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pp 5331–5340
Rameswar Panda, Abir Das, Ziyan Wu, Jan Ernst, and Amit K Roy-Chowdhury. Weakly supervised summarization of web videos. In Proceedings of the IEEE International Conference on Computer Vision, pages 3657–3666, 2017
Redmon J. Farhadi A (2018) Yolov3: An incremental improvement. ar**v preprint ar**v:1804.02767
Ren L, Lu J, Wang Z, Tian Q, Zhou J (2018) Collaborative deep reinforcement learning for multi-object tracking. In Proceedings of the European conference on computer vision (ECCV), pp 586–602
Ren L, Yuan X, Lu J, Yang M, Zhou J (2018) Deep reinforcement learning with iterative shift for visual tracking. In Proceedings of the European conference on computer vision (ECCV), pp 684–700
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp 91–99
Reza M, Kosecka J, et al (2016) Reinforcement learning for semantic segmentation in indoor scenes. ar**v preprint ar**v:1606.01178
Richard A, Gall J (2016) Temporal action detection using a statistical language model. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3131–3140
Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In Proceedings of the European conference on computer vision (ECCV), pp 347–363
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention, pp 234–241. Springer
Ros G, Koltun V, Codevilla F, Lopez A (2019) The carla autonomous driving challenge
Rotman D (2013) Mit technology review. Retrieved from meet the man with a cheap and easy plan to stop global warming. http://www. technologyreview. com/featuredstor y/511016/a-cheap-and-easy-plan-to-stop-globalwarming
Rouet J-M, Jacq J-J, Roux C (2000) Genetic algorithms for a robust 3-d mr-ct registration. IEEE Trans Inf Technol Biomed 4(2):126–136
Rumelhart DE (1998) the architecture of mind: a connectionist approach. Mind Read pp 207–238
Runarsson TP, Lucas SM (2012) Imitating play from game trajectories: Temporal difference learning versus preference learning. In 2012 IEEE conference on computational intelligence and games (CIG), pp 79–82
Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In Proceedings of the IEEE international conference on computer vision, pp 300–311
Sahba F (2016) Deep reinforcement learning for object segmentation in video sequences. In 2016 International conference on computational science and computational intelligence (CSCI), pp 857–860. IEEE
Sahba F, Tizhoosh HR, Salama MMA (2006) A reinforcement learning framework for medical image segmentation. In The 2006 IEEE international joint conference on neural network proceedings, pp 511–517. IEEE
Sahba F, Tizhoosh HR, Salama MMMA (2007) Application of opposition-based reinforcement learning in image segmentation. In 2007 IEEE symposium on computational intelligence in image and signal processing, pp 246–251. IEEE,
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In International conference on machine learning, pp 1889–1897
Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. ar**v e-prints
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. ar**v e-prints
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. ar**v preprint ar**v:1707.06347
Sefati S, Cowan NJ, Vidal R (2015) Learning shared, discriminative dictionaries for surgical gesture segmentation and classification. In MICCAI workshop: M2CAI, vol 4
Sepp H, Jürgen S (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Seung-Hwan B, Kuk-** Y (2017) Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Trans Pattern Anal Mach Intell 40(3):595–610
Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast yolo: a fast you only look once system for real-time embedded object detection in video. ar**v preprint ar**v:1709.05943
Shaker MR, Yue S, Duckett T (2009)Vision-based reinforcement learning using approximate policy iteration. In 2009 international conference on advanced robotics, pp 1–6
Shalabh B, Sutton Richard S, Mohammad G, Mark L (2009) Natural actorâ-critic algorithms. Automatica 45(11):2471–2482
Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arxiv:CoRR:abs/1610.03295
Shen J, Zafeiriou S, Chrysos GG, Kossaifi J, Tzimiropoulos G, Pantic M (2015) The first facial landmark tracking in-the-wild challenge: Benchmark and results. In Proceedings of the IEEE international conference on computer vision workshops, pp 50–58
Shi Y, Cui L, Qi Z, Meng F, Chen Z (2016) Automatic road crack detection using random structured forests. IEEE Trans Intell Transp Syst 17(12):3434–3445
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In European conference on computer vision, pp 746–760. Springer
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556
Sindagi VA, Patel VM (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting. In Proceedings of the IEEE international conference on computer vision, pp 1002–1012
Singh SP, Wang L, Gupta S, Goli H, Padmanabhan P, Gulyás B (2020) 3d deep learning on medical images: a review
Song G, Myeong H, Lee KM (2018) Seednet: automatic seed generation with deep reinforcement learning for robust interactive segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1760–1768
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5179–5187
Song Y, Ma C, Gong L, Zhang J, Lau RWH, Yang M-H (2017) Crest: Convolutional residual learning for visual tracking. In Proceedings of the IEEE international conference on computer vision, pp 2555–2564
Stadie BC, Abbeel P, Sutskever I (2017) Third-person imitation learning. arxiv:CoRR:abs/1703.01703
Subramanian J, Mahajan A (2019) Reinforcement learning in stationary mean-field games, pp 251–259. International foundation for autonomous agents and multiagent systems
Sun S, Hu J, Yao M, Hu J, Yang X, Song Q, Wu X (2018) Robust multimodal image registration using deep recurrent reinforcement learning. In Asian conference on computer vision, pp 511–526. Springer
Sundararajan K, Woodard DL (2018) Deep learning for biometrics: a survey. 51:3
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
Sutton RS, McAllester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th international conference on neural information processing systems, NIPS’99, pp 1057–1063
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems vol 12, pp 1057–1063
Szegedy C, Ioffe S, Vanhoucke V, lemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligent
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich V (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In Advances in neural information processing systems, pp 2553–2561
Sæmundsson S, Hofmann K, Deisenroth KP (2018) Meta reinforcement learning with latent variable gaussian processes. ar**v preprint ar**v:1803.07551
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332
Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Tian Z, Si X, Zheng Y, Chen Z, Li X (2020) Multi-step medical image segmentation based on reinforcement learning. J Ambient Intell Human Comput
Tianyang X, Zhen-Hua F, **ao-Jun W, Josef K (2019) Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process 28(11):5596–5609
Todd H, Michael Q, Peter S (2011) A real-time model-based reinforcement learning architecture for robot control. arxiv:CoRR:abs/1105.1749
Toro OJ, Müller H, Krenn M, Gruenberg K, Taha AA, Winterstein M, Eggel I, Foncubierta-Rodríguez A, Goksel O, Jakab A et al (2016) Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: Visceral anatomy benchmarks. IEEE Trans Med Imaging 35(11):2459–2475
Toromanoff M, Wirbel E, Moutarde F (2020) End-to-end model-free reinforcement learning for urban driving using implicit affordances. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7153–716
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Tsai Y-H, Yang M-H, Black MJ (2016) Video segmentation via object flow. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3899–3908
Tsurumine Y, Cui Y, Yamazaki K, Matsubara K (2019) Generative adversarial imitation learning with deep p-network for robotic cloth manipulation. In 2019 IEEE-RAS 19th international conference on humanoid robots (humanoids), pp 274–280
Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Uzkent B, Yeh C, Ermon S (2020) Efficient object detection in large images using deep reinforcement learning. In The IEEE winter conference on applications of computer vision, pp 1824–1833
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence
Van Hove L (2001) Optimal denominations for coins and bank notes: in defense of the principle of least effort. J Money Credit Bank pp 1015–1021
Vecchio G, Palazzo S, Giordano D, Rundo F, Spampinato C (2020) Mask-rl: Multiagent video object segmentation framework through reinforcement learning. IEEE Trans Neural Netw Learn Syst
Vijayanarasimhan S, Ricco S, Schmid C, Sukthankar R, Fragkiadaki K (2017) Sfm-net: learning of structure and motion from video. ar**v preprint ar**v:1704.07804
Vinyals O, Babuschkin I, Chung J, Mathieu M, Jaderberg M, Czarnecki W, Dudzik A, Huang A, Georgiev P, Powell R, Ewalds T, Horgan D, Kroiss M, Danihelka I, Agapiou J, Oh J, Dalibard V, Choi D, Sifre L, Sulsky Y, Vezhnevets S, MolloyJ , Cai T, Budden D, Paine T, Gulcehre C, Wang Z, Pfaff T, Pohlen T, Yogatama D, Cohen J, McKinney K, Smith O, Schaul T, Lillicrap T, Apps C, Kavukcuoglu K, Hassabis D, Silver D (2019) AlphaStar: mastering the real-time strategy game starCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/
Vlontzos A, Alansary A, Kamnitsas K, Rueckert D, Kainz B (2019) Multiple landmark detection using multi-agent reinforcement learning. In International conference on medical image computing and computer-assisted intervention, pp 262–270
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Wang G, Zuluaga MA, Li W, Pratt R, Patel PA, Aertsen M, Doel T, David AL, Deprest J, Ourselin S, et al (2018) Deepigeos: a deep interactive geodesic framework for medical image segmentation. IEEE Trans Pattern Anal Mach Intell 41(7):1559–1572
Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274
Wang JX, Kurth-Nelson Z, Tirumala D, Soyer H, Leibo JZ, Munos R, Blundell C, Kumaran D, Botvinick M (2016 )Learning to reinforcement learn. arxiv:CoRR:abs/1611.05763, 2016
Wang L, Lu H, Ruan X, Yang M-H (2015) Deep networks for saliency detection via local estimation and global search. In 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3183–3192. IEEE
Wang M, Deng W (2020) Mitigating bias in face recognition using skewness-aware reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9322–9331
Wang M, Deng W, Hu J, Tao X, Huang Y (2019) Racial faces in the wild: Reducing racial bias by information maximization adaptation network. In Proceedings of the IEEE international conference on computer vision, pp 692–702
Wang N. Yeung D-Y (2013) Learning a deep compact image representation for visual tracking. In Advances in neural information processing systems, pp 809–817
Wang T, Bao X, Clavera I, Hoang J, Wen Y, Langlois E, Zhang S, Zhang G, Abbeel P, Ba J (2019) Benchmarking model-based reinforcement learning. arxiv:CoRR:abs/1907.02057
Wang Y, Dong M, Shen J, Wu Y, Cheng S, Pantic M (2020) Dynamic face video segmentation via reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6959–6969
Wang Z, Zhang J, Lin M, Wang J, Luo P, Ren J (2020) Learning a reinforced agent for flexible exposure bracketing selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1820–1828
Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. ar**v preprint ar**v:1511.06581
Weiming H, ** L, Wenhan L, **aoqin Z, Stephen M, Zhongfei Z (2012) Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model. IEEE Trans Pattern Anal Mach Intell 34(12):2420–2440
Wickelgren WA (1973) The long and the short of memory. Psychol Bull 80(6):425
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Wirth C, Fürnkranz J (2015) On learning from game annotations. IEEE Trans Comput Intell AI Games 7(3):304–316
Wohlhart P, Lepetit V (2015) Learning descriptors for object recognition and 3d pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3109–3118
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In European conference on computer vision, pp 3–19
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
**a L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops, pages 20–27. IEEE
**ahai Z, Juan S (2016) Multi-scale patch and multi-modality atlases for whole heart segmentation of mri. Med Image Anal 31:77–87
**ang S, Li H (2017) On the effects of batch and weight normalization in generative adversarial networks. ar**v preprint ar**v:1704.03971
**ang Y, Alahi A, Savarese S (2015) Learning to track: online multi-object tracking by decision making. In Proceedings of the IEEE international conference on computer vision, pp 4705–4713
**ao F, Lee YJ (2016) Track and segment: an iterative unsupervised approach for video object proposals. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 933–942
**e Q, Luong M-T, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10687–10698
**ong H, Lu H, Liu C, Liu L, Cao Z, Shen C (2019) From open set to closed set: Counting objects by spatial divide-and-conquer. In Proceedings of the IEEE international conference on computer vision, pp 8362–8371
Xu H, Su F (2015) Robust seed localization and growing with deep convolutional features for scene text detection. In Proceedings of the 5th ACM on international conference on multimedia retrieval, pp 387–394. ACM
Xu N, Price B, Cohen S, Yang J, Huang TS (2016) Deep interactive object selection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 373–381
Xu Y-S, Fu T-J, Yang H-K, Lee C-Y (2018) Dynamic video segmentation network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6556–6565
Xuanang X, Fugen Z, Bo L, Dongshan F, **angzhi B (2019) Efficient multiple organ localization in ct image using 3d region proposal network. IEEE Trans Med Imaging 38(8):1885–1898
Yamakazi K, Viet-Khoa Vo-Ho AS, Le NTH, Tran T (2021) Agent-environment network for temporal action proposal generation. In International conference on acoustics, speech and signal processing
Yamazaki K, Rathour VS, Le T (2021) Invertible residual network with regularization for effective medical image segmentation. ar**v preprint ar**v:2103.09042
Yan W, Lei Z, Lituan W, Zizhou W (2018) Multitask learning for object localization with deep reinforcement learning. IEEE Trans Cogn Deve Syst 11(4):573–580
Yan Z, Yuan Y, Zuo W, Tan X, Wang Y, Wen S, Ding E (2019) Perspective-guided convolution networks for crowd counting. In Proceedings of the IEEE international conference on computer vision, pp 952–961
Yang Z, Huang L, Chen Y, Wei Z, Ahn S, Zelinsky G, Samaras D, Hoai M (2020) Predicting goal-directed human attention using inverse reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Yi W, Jongwoo L, Ming-Hsuan Y (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Yong KD, Moongu J (2014) Data fusion of radar and image measurements for multi-object tracking via kalman filtering. Inf Sci 278:641–652
Yoon JH, Lee CR, Yang MH, Yoon KJ (2016) Online multi-object tracking via structural constraint event aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1392–1400
Yoshihisa T, Yunduan C, Eiji U, Takamitsu M (2019) Deep reinforcement learning with smooth policy update: application to robotic cloth manipulation. Robot Auton Syst 112:72–83
Yoshua B, Patrice S, Paolo F (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
You C, Lu J, Filev D, Tsiotras P (2019) Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robot Auton Syst 114:1–18
Yu C, Liu J, Nemati S (2019) Reinforcement learning in healthcare: a survey. ar**v preprint ar**v:1908.08796
Yu T, Quillen D, He Z, Julian R, Hausman K, Finn C, Levine S (2020) Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp 1094–1100
Yun S, Choi J, Yoo Y, Yun K, Choi JY (2017) Action-decision networks for visual tracking with deep reinforcement learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2711–2720
Yunliang C, Said O, Manas S, Mark L, Shuo L (2015) Multi-modality vertebra recognition in arbitrary views using 3D deformable hierarchical model. IEEE Trans Med Imaging 34(8):1676–1693
Yushi C, ** J (2015) Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Select Top Appl Earth Observ Remote Sens 8(6):2381–2392
Zdenek K, Krystian M, Jiri M (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Zengyi Q, **glu W, Yan L (2019) Monogrnet: a geometric reasoning network for monocular 3d object localization. Proc AAAI Confer Artific Intell 33(01):8851–8858
Zha D, Lai K-H, Zhou K, Hu X (2019) Experience replay optimization. ar**v preprint ar**v:1906.08387
Zhang J, Li W, Ogunbona PO, Wang P, Tang C (2016) Rgb-d-based action recognition datasets: a survey. Pattern Recogn 60:86–105
Zhang-Wei H, Chen Yu-M, Shih-Yang S, Tzu-Yun S, Yi-Hsiang C, Hsuan-Kung Y, Brian Hsi-Lin H, Chih-Chieh T, Yueh-Chuan C, Tsu-Ching H, et al. Virtual-to-real: learning to control in visual semantic segmentation. ar**v preprint ar**v:1802.00285
Zhang D, Maei H, Wang X, Wang Y-F (2017) Deep reinforcement learning for visual object tracking in videos. ar**v preprint ar**v:1701.08936
Zhang D, Yang L, Meng D, Xu D, Han J (2017) Spftn: a self-paced fine-tuning network for segmenting objects in weakly labelled videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4429–4437
Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In European conference on computer vision, pages 766–782. Springer
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European conference on computer vision (ECCV), pp 405–420
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Zheng Y, Liu D, Georgescu B, Nguyen H, Comaniciu D (2015) 3d deep learning for efficient and robust landmark detection in volumetric data. In International conference on medical image computing and computer-assisted intervention, pp 565–572. Springer
Zhewei H, Wen H, Shuchang Z (2019) Learning to paint with model-based deep reinforcement learning. In Proceedings of the IEEE international conference on computer vision, pp 8709–8718
Zhiheng H, Wei X, Kai Y (2015) Bidirectional lstm-crf models for sequence tagging. ar**v preprint ar**v:1508.01991
Zhiwu H, Chengde W, Thomas P, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6099–6108
Zhong-Qiu Z, Shou-Tao X, Dian L, Wei-Dong T, Zhi-Da J (2019) A review of image set classification. Neurocomputing 335:251–260
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633–641
Zhou K, Qiao Y, **ang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Thirty-Second AAAI conference on artificial intelligence
Zhou K, **ang T, Cavallaro A (2018) Video summarisation by classification with deep reinforcement learning. ar**v preprint ar**v:1807.03089
Zhu X, **ong Y, Dai J, Yuan L, Wei Y (2017) Deep feature flow for video recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2349–2358
Zou WY, Wang X, Sun X, Lin Y (2014) Generic object detection with dense neural patterns and regionlets. ar**v preprint ar**v:1404.4316
van Beek P (2018) Improved image selection for stack-based hdr imaging. ar**v preprint ar**v:1806.07420
van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double q-learning. ar**v e-prints, ar**v:1509.06461
Acknowledgements
This material is based upon work supported by the National Science Foundation under Award No OIA-1946391.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Le, N., Rathour, V.S., Yamazaki, K. et al. Deep reinforcement learning in computer vision: a comprehensive survey. Artif Intell Rev 55, 2733–2819 (2022). https://doi.org/10.1007/s10462-021-10061-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10061-9