Log in

Generating robust real-time object detector with uncertainty via virtual adversarial training

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Despite remarkable accuracy improvement in convolutional neural networks (CNNs) based object detectors, there are still some problems in applying on some safety–critical domain, such as the self-driving domain, in part due to the complexity of verifying the correctness of detecting results and the lack of safety guarantees. By simply modeling the bounding box parameters with a Gaussian distribution in a real-time object detector, we propose a new method for predicting uncertainty, which can quantify the reliability of the neural networks’ prediction, to validate the correctness of detecting results with low computational complexity. In addition, we redesign the loss function by adding a new regularization term, called virtual adversarial training (VAT). The use of VAT, which is defined as the robustness of the conditional label distribution around input data against local perturbation, can smooth the output distribution robust with lower uncertainty and the prediction from the regularized model will be better. In consideration of the trade-off between the size and speed, we choose some lightweight models as the backbone of a YOLOv3 detector and the experimental results on PASCAL VOC dataset and MS COCO demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Kaiming H, **angyu Z, Shaoqing R, Jian Sun (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  2. Christian S et al (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–9

  3. Karen S and Andrew Z (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR)

  4. Jie H, Li S, Gang S (2018) Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141

  5. Yunpeng C et al (2017) Dual path networks. In: Advances in Neural Information Processing Systems (NIPS), pp 4467–4475

  6. Saining X et al (2017) Aggregated residual transformations for deep neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5987–5995

  7. Mark E et al (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  8. Tsung-Yi L et al (2014) Microsoft COCO: common objects in context. In: 2014 European Conference on Computer Vision (ECCV), pp 740–755

  9. Joseph R, Ali F (2018) YOLOv3: an incremental improvement. CoRR. ar**v:1804.02767

  10. Shaoqing R, Kaiming H, Ross G, Jian S (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  11. Ross G (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448

  12. Wei L et al (2016) SSD: single shot multibox detector. European conference on computer vision (ECCV). Springer, Cham, pp 21–37

    Google Scholar 

  13. Shifeng Z, Longyin W, **ao B, Zhen L, Stan L (2018) Single-shot refinement neural network for object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 4203–4212

  14. Peng Z, Bingbing N, Cong G, Jianguo H, Yi X (2018) Scale-transferrable object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 528–537

  15. Mark S, Andrew G.H, Menglong Z, Andrey Z, Liang-Chieh C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520

  16. Ningning M, **angyu Z, Hai-Tao Z, Jian S (2018) ShuffleNet V2: practical guidelines for efficient cnn architecture design. In: 2018 European Conference on Computer Vision (ECCV), pp 122–138

  17. Forrest NI (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR. ar**v:1602.07360

  18. François C (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807

  19. Alex K, Yarin G (2017) What uncertainties do we need in Bayesian deep learning for computer vision? In: The Advances in Neural Information Processing Systems(NIPS), pp 5574–5584

  20. Yarin G, Zoubin G (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: 2016 International Conference on Machine Learning (ICML), pp 1050–1059

  21. Sungjoon C, Kyungjae L, Sungbin L, Songhwai O (2018) Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp 6915–6922

  22. Yihui H, Chenchen Z, Jianren W, Marios S, **angyu Z (2019) Bounding box regression with uncertainty for accurate object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2888–2897

  23. Ian JG, Jonathon S, Christian S (2015) Explaining and harnessing adversarial examples. In: 2015 International Conference on Learning Representations (ICLR)

  24. Yinpeng D et al (2018) Boosting adversarial attacks with momentum. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 9185–9193

  25. Aleksander M et al (2018) Towards deep learning models resistant to adversarial attacks. In: 2018 International Conference on Learning Representations (ICLR)

  26. Florian T et al (2018) Ensemble adversarial training: attacks and defenses. In: 2018 International Conference on Learning Representations (ICLR)

  27. Takeru M, Shin-ichi M, Masanori K, Shin I (2019) Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 41(8):1979–1993

    Article  Google Scholar 

  28. Cihang X et al. (2017) adversarial examples for semantic segmentation and object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1378–1387

  29. **ngxing W, Siyuan L, Ning C, **aochun C (2019) Transferable adversarial attacks for image and video object detection. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp 954–960

  30. Ross BG, Jeff D, Trevor D, Jitendra M (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 580–587

  31. Joseph R et al. (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788

  32. Tsung-Yi L et al (2017) Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2999–3007

  33. Joseph R, Ali F (2017) YOLO9000: Better, Faster, Stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525.

  34. Andrew GH et al (2017) MobileNets: efficient convolutional neural networks for mobile vision. CoRR. ar**v:1704.04861

  35. **angyu Z, **nyu Z, Mengxiao L, Jian S (2018) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6848–6856

  36. Mingxing T et al (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: 2019 Proceedings of the 36th International Conference on Machine Learning (ICML), pp 6105–6114

  37. Rajat S et al (2020) ULSAM: ultra-lightweight subspace attention module for compact convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1616–1625

  38. Fahimeh F et al (2020) Lightweight residual densely connected convolutional neural network. CoRR. ar**v:2001.00526

  39. Xu M et al (2020) Cascaded context dependency: an extremely lightweight module for deep convolutional neural networks. In: 2020 IEEE International Conference on Image Processing (ICIP), pp 1741–1745

  40. Shyh J et al (2021) A novel lightweight convolutional neural network, exquisiteNetV2. CoRR. ar**v:2105.09008

  41. Charles B, Julien C, Koray K, Daan W (2015) Weight uncertainty in neural networks. CoRR. ar**v:1505.05424

  42. Balaji L, Alexander P, Charles B (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems (NIPS), pp 6402–6413

  43. Yarin G, Zoubin G (2016) Bayesian convolutional neural networks with bernoulli approximate variational inference. In: 2016 International Conference on Learning Representations (ICLR)

  44. Kumar S et al (2018) Uncertainty estimations by softplus normalization in bayesian convolutional neural networks with variational inference. CoRR. ar**v:1806.05978

  45. Lewis S, Yarin G (2018) Understanding measures of uncertainty for adversarial example detection. In: The Conference on Uncertainty in Artificial Intelligence (UAI), pp 560–569

  46. Youngwan L et al (2020) Localization uncertainty estimation for anchor-free object detection. CoRR. ar**v:2006.15607

  47. Zhi T et al (2019) FCOS: fully convolutional one-stage object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp 9626–9635

  48. Yan L et al (2020) Loss rescaling by uncertainty inference for single-stage object detection. In: 2020 IEEE International Conference on Image Processing (ICIP), pp 698–702

  49. Marius S et al (2020) MetaDetect: uncertainty quantification and prediction quality estimates for object detection. CoRR. ar**v:2010.01695

  50. Shixiang G, Luca R (2015) Towards deep neural network architectures robust to adversarial examples. In: the workshop at 2015 International Conference on Learning Representations (ICLR).

  51. Christian S et al (2014) Intriguing properties of neural networks. In: 2014 International Conference on Learning Representations (ICLR).

  52. Philip B, Ouais A, Doina P (2014) Learning with Pseudo-Ensembles. Adv Neural Inf Process Syst (NIPS) 27:3365–3373

    Google Scholar 

  53. Hongyi Z et al. (2018) mixup: beyond empirical risk minimization. In: 2018 International Conference on Learning Representations (ICLR)

  54. Sangdoo Y et al (2019) CutMix: regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp 6022–6031

  55. Yuxin W, Kaiming H (2018) Group normalization. In: 2016 European Conference on Computer Vision (ECCV), pp 3–19

  56. Sergey I, Christian S (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 2015 International Conference on Machine Learning (ICML), pp 448–456

  57. Hamid R et al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 658–666

  58. Navaneeth B, Bharat S, Rama C, Larry SD (2017) Improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 5562–5570

  59. Xu H et al (2019) A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. Int J Mach Learn Cybern 10(12):3687–3699

    Article  Google Scholar 

  60. Habiba A et al (2019) Multi-level features fusion and selection for human gait recognition: an optimized framework of Bayesian model and binomial distribution. Int J Mach Learn Cybern 10(12):3601–3618

    Article  Google Scholar 

  61. Diederik PK, Jimmy B (2015) Adam: a method for stochastic optimization. In: 2015 International Conference on Learning Representations (ICLR).

  62. Alexey B et al (2020) YOLOv4: optimal speed and accuracy of object detection. CoRR. ar**v:2004.10934

Download references

Acknowledgements

This work is sponsored by National Natural Science Foundation of China (No.51874022, No.51674031,) and National Key R&D Program of China (no.2018YFB0704304).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ke Xu or **aojuan Ban.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Xu, K., He, D. et al. Generating robust real-time object detector with uncertainty via virtual adversarial training. Int. J. Mach. Learn. & Cyber. 13, 431–445 (2022). https://doi.org/10.1007/s13042-021-01416-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01416-3

Keywords

Navigation