Abstract
Despite remarkable accuracy improvement in convolutional neural networks (CNNs) based object detectors, there are still some problems in applying on some safety–critical domain, such as the self-driving domain, in part due to the complexity of verifying the correctness of detecting results and the lack of safety guarantees. By simply modeling the bounding box parameters with a Gaussian distribution in a real-time object detector, we propose a new method for predicting uncertainty, which can quantify the reliability of the neural networks’ prediction, to validate the correctness of detecting results with low computational complexity. In addition, we redesign the loss function by adding a new regularization term, called virtual adversarial training (VAT). The use of VAT, which is defined as the robustness of the conditional label distribution around input data against local perturbation, can smooth the output distribution robust with lower uncertainty and the prediction from the regularized model will be better. In consideration of the trade-off between the size and speed, we choose some lightweight models as the backbone of a YOLOv3 detector and the experimental results on PASCAL VOC dataset and MS COCO demonstrate the effectiveness of the proposed approach.
Similar content being viewed by others
References
Kaiming H, **angyu Z, Shaoqing R, Jian Sun (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Christian S et al (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–9
Karen S and Andrew Z (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR)
Jie H, Li S, Gang S (2018) Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141
Yunpeng C et al (2017) Dual path networks. In: Advances in Neural Information Processing Systems (NIPS), pp 4467–4475
Saining X et al (2017) Aggregated residual transformations for deep neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5987–5995
Mark E et al (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Tsung-Yi L et al (2014) Microsoft COCO: common objects in context. In: 2014 European Conference on Computer Vision (ECCV), pp 740–755
Joseph R, Ali F (2018) YOLOv3: an incremental improvement. CoRR. ar**v:1804.02767
Shaoqing R, Kaiming H, Ross G, Jian S (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Ross G (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448
Wei L et al (2016) SSD: single shot multibox detector. European conference on computer vision (ECCV). Springer, Cham, pp 21–37
Shifeng Z, Longyin W, **ao B, Zhen L, Stan L (2018) Single-shot refinement neural network for object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 4203–4212
Peng Z, Bingbing N, Cong G, Jianguo H, Yi X (2018) Scale-transferrable object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 528–537
Mark S, Andrew G.H, Menglong Z, Andrey Z, Liang-Chieh C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520
Ningning M, **angyu Z, Hai-Tao Z, Jian S (2018) ShuffleNet V2: practical guidelines for efficient cnn architecture design. In: 2018 European Conference on Computer Vision (ECCV), pp 122–138
Forrest NI (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR. ar**v:1602.07360
François C (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807
Alex K, Yarin G (2017) What uncertainties do we need in Bayesian deep learning for computer vision? In: The Advances in Neural Information Processing Systems(NIPS), pp 5574–5584
Yarin G, Zoubin G (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: 2016 International Conference on Machine Learning (ICML), pp 1050–1059
Sungjoon C, Kyungjae L, Sungbin L, Songhwai O (2018) Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp 6915–6922
Yihui H, Chenchen Z, Jianren W, Marios S, **angyu Z (2019) Bounding box regression with uncertainty for accurate object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2888–2897
Ian JG, Jonathon S, Christian S (2015) Explaining and harnessing adversarial examples. In: 2015 International Conference on Learning Representations (ICLR)
Yinpeng D et al (2018) Boosting adversarial attacks with momentum. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 9185–9193
Aleksander M et al (2018) Towards deep learning models resistant to adversarial attacks. In: 2018 International Conference on Learning Representations (ICLR)
Florian T et al (2018) Ensemble adversarial training: attacks and defenses. In: 2018 International Conference on Learning Representations (ICLR)
Takeru M, Shin-ichi M, Masanori K, Shin I (2019) Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 41(8):1979–1993
Cihang X et al. (2017) adversarial examples for semantic segmentation and object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1378–1387
**ngxing W, Siyuan L, Ning C, **aochun C (2019) Transferable adversarial attacks for image and video object detection. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp 954–960
Ross BG, Jeff D, Trevor D, Jitendra M (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 580–587
Joseph R et al. (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788
Tsung-Yi L et al (2017) Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2999–3007
Joseph R, Ali F (2017) YOLO9000: Better, Faster, Stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525.
Andrew GH et al (2017) MobileNets: efficient convolutional neural networks for mobile vision. CoRR. ar**v:1704.04861
**angyu Z, **nyu Z, Mengxiao L, Jian S (2018) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6848–6856
Mingxing T et al (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: 2019 Proceedings of the 36th International Conference on Machine Learning (ICML), pp 6105–6114
Rajat S et al (2020) ULSAM: ultra-lightweight subspace attention module for compact convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1616–1625
Fahimeh F et al (2020) Lightweight residual densely connected convolutional neural network. CoRR. ar**v:2001.00526
Xu M et al (2020) Cascaded context dependency: an extremely lightweight module for deep convolutional neural networks. In: 2020 IEEE International Conference on Image Processing (ICIP), pp 1741–1745
Shyh J et al (2021) A novel lightweight convolutional neural network, exquisiteNetV2. CoRR. ar**v:2105.09008
Charles B, Julien C, Koray K, Daan W (2015) Weight uncertainty in neural networks. CoRR. ar**v:1505.05424
Balaji L, Alexander P, Charles B (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems (NIPS), pp 6402–6413
Yarin G, Zoubin G (2016) Bayesian convolutional neural networks with bernoulli approximate variational inference. In: 2016 International Conference on Learning Representations (ICLR)
Kumar S et al (2018) Uncertainty estimations by softplus normalization in bayesian convolutional neural networks with variational inference. CoRR. ar**v:1806.05978
Lewis S, Yarin G (2018) Understanding measures of uncertainty for adversarial example detection. In: The Conference on Uncertainty in Artificial Intelligence (UAI), pp 560–569
Youngwan L et al (2020) Localization uncertainty estimation for anchor-free object detection. CoRR. ar**v:2006.15607
Zhi T et al (2019) FCOS: fully convolutional one-stage object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp 9626–9635
Yan L et al (2020) Loss rescaling by uncertainty inference for single-stage object detection. In: 2020 IEEE International Conference on Image Processing (ICIP), pp 698–702
Marius S et al (2020) MetaDetect: uncertainty quantification and prediction quality estimates for object detection. CoRR. ar**v:2010.01695
Shixiang G, Luca R (2015) Towards deep neural network architectures robust to adversarial examples. In: the workshop at 2015 International Conference on Learning Representations (ICLR).
Christian S et al (2014) Intriguing properties of neural networks. In: 2014 International Conference on Learning Representations (ICLR).
Philip B, Ouais A, Doina P (2014) Learning with Pseudo-Ensembles. Adv Neural Inf Process Syst (NIPS) 27:3365–3373
Hongyi Z et al. (2018) mixup: beyond empirical risk minimization. In: 2018 International Conference on Learning Representations (ICLR)
Sangdoo Y et al (2019) CutMix: regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp 6022–6031
Yuxin W, Kaiming H (2018) Group normalization. In: 2016 European Conference on Computer Vision (ECCV), pp 3–19
Sergey I, Christian S (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 2015 International Conference on Machine Learning (ICML), pp 448–456
Hamid R et al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 658–666
Navaneeth B, Bharat S, Rama C, Larry SD (2017) Improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 5562–5570
Xu H et al (2019) A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. Int J Mach Learn Cybern 10(12):3687–3699
Habiba A et al (2019) Multi-level features fusion and selection for human gait recognition: an optimized framework of Bayesian model and binomial distribution. Int J Mach Learn Cybern 10(12):3601–3618
Diederik PK, Jimmy B (2015) Adam: a method for stochastic optimization. In: 2015 International Conference on Learning Representations (ICLR).
Alexey B et al (2020) YOLOv4: optimal speed and accuracy of object detection. CoRR. ar**v:2004.10934
Acknowledgements
This work is sponsored by National Natural Science Foundation of China (No.51874022, No.51674031,) and National Key R&D Program of China (no.2018YFB0704304).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Y., Xu, K., He, D. et al. Generating robust real-time object detector with uncertainty via virtual adversarial training. Int. J. Mach. Learn. & Cyber. 13, 431–445 (2022). https://doi.org/10.1007/s13042-021-01416-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01416-3