Log in

Neural Network Technologies for Detection and Classification of Objects

  • Published:
Optoelectronics, Instrumentation and Data Processing Aims and scope

Abstract

We present a review of the basic ideas used in solving the problems of detecting and classifying objects by their images using neural network technologies. The key publications on the most popular ways to improve classification accuracy are considered. It is shown that in the last decade, neural network methods for detecting objects have achieved significant success by using convolution technologies and applying deep learning with large databases. The main shortcomings, limitations and possible directions for the improvement of existing approaches are analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

REFERENCES

  1. Methods of Computer Image Processing, Ed. by V. A. Soifer (Fizmatlit, Moscow, 2003).

    Google Scholar 

  2. A. A. Luk’yanitsa and A. G. Shishkin, Digital Video Image Processing (Ai-Es-Es Press, Moscow, 2009).

    Google Scholar 

  3. D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach (Prentice Hall, 2002).

    Google Scholar 

  4. G. Stockman and L. G. Shapiro, Computer Vision (Prentice Hall, 2006).

    Google Scholar 

  5. I. S. Gruzman, V. S. Kirichuk, V. P. Kosykh, G. I. Peretyagin, and A. A. Spektor, Digital Image Processing in Information Systems (Novosibirsk. Gos. Tekh. Univ., Novosibirsk, 2002).

  6. Yu. I. Zhuravlev, V. V. Ryazanov, and O. V. Sen’ko, Mathematical Methods: Program System: Practical Applications (Fazis, Moscow, 2006).

    Google Scholar 

  7. P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of simple features,’’ in Proc. 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. CVPR 2001, Kanai, Hawaii, 2001 (IEEE, 2001), pp. 511–518. https://doi.org/10.1109/cvpr.2001.990517

  8. N. Dalal and B. Triggs, ‘‘Histograms of oriented gradients for human detection,’’ in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, Calif., 2005 (IEEE, 2005), pp. 886–893. https://doi.org/10.1109/cvpr.2005.177

  9. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: A large-scale hierarchical image database,’’ in 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009 (IEEE, 2009), pp. 248–255. https://doi.org/10.1109/cvpr.2009.5206848

  10. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and L. Fei-Fei, ‘‘ImageNet Large Scale Visual Recognition Challenge,’’ Int. J. Comput. Vision 115, 211-252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  11. M. Everingham, S. Eslami, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, ‘‘The Pascal Visual Object Classes Challenge: A retrospective,’’ Int. J. Comput. Vision 111, 98-136 (2015). https://doi.org/10.1007/s11263-014-0733-5

    Article  Google Scholar 

  12. S. Mann, ‘‘Glasseyes: The theory of eyetap digital eye glass,’’ New Engl. J. Med. 31 (3), 10–14 (2012). http://wearcam.org/glass.pdf. Cited January 17, 2022.

  13. Development Edition, Microsoft official site, (2016). https://www.microsoft.com/microsoft-hololens/en-us. Cited November 19, 2021.

  14. Meet Kinect for Windows, Microsoft official site, (2016). https://dev.windows.com/en-us/kinect. Cited November 19, 2021.

  15. PyTorch, https://pytorch.org/. Cited January 14, 2022.

  16. Vuforia 5.5 SDK, Vuforia Developer Portal, (2016). https://developer.vuforia.com/downloads/sdk. Cited November 12, 2021.

  17. Kudan SDK 1.2.3 version, Kudan Augmented Reality, (2016). https://www.kudan.eu/download/. Cited November 17, 2021.

  18. D.-H. Kenneth, A Practical Introduction to Computer Vision with OpenCV (Trinity College Dublin, Dublin, 2014).

    Google Scholar 

  19. Z. Zou, K. Chen, Z. Shi, Yu. Guo, and J. Ye, ‘‘Object detection in 20 years: A survey,’’ Proc. IEEE 111, 257–263 (2023). https://doi.org/10.1109/JPROC.2023.3238524

    Article  Google Scholar 

  20. N. A. Andriyanov, V. E. Dement’ev, and A. G. Tashlinskii, ‘‘Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks,’’ Komp’yuternaya Opt. 46, 139–159 (2022). https://doi.org/10.18287/2412-6179-CO-922

    Article  ADS  Google Scholar 

  21. Tensor Flow 2 detection zoo, https://github.com/tensorflow/models/blob/master/research/object_detec- tion/g3doc/tf2_detection_zoo.md. Cited January 27, 2023.

  22. The Neural Network Zoo, https://www.asimovinstitute.org/neural-network-zoo/. Cited January 27, 2023.

  23. P. D. Wassermen, Neural Computing: Theory and Practice (Van Nostrand Reinhold, 1989).

  24. ‘‘Areas of application of neural networks: Classification of neural networks: Review and analysis of neural networks,’’ https://studbooks.net/2030598/informatika/oblasti_primeneniya_neyronnyh_setey. Cited January 27, 2023.

  25. S. G. Nikolaeva, Neural Networks: MATLAB Implementation: Textbook (Kazan. Gos. Energ. Univ., Kazan, 2015).

  26. A. I. Galushkin, Synthesis of Multilayer Pattern Recognition Systems (Energiya, Moscow, 1974).

    Google Scholar 

  27. P. J. Werbos, ‘‘Beyond regression: New tools for prediction and analysis in the behavioral sciences,’’ in PhD Thesis (Harvard Univ., Cambridge, 1974), pp. 453.

  28. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning internal representations by error propagation,’’ in Parallel Distributed Processing (MIT Press, Cambridge, 1986), pp. 318-362.

    Book  Google Scholar 

  29. CIFAR-10 and CIFAR-100 datasets, https://www.cs.toronto.edu/kriz/cifar.html. Cited January 27, 2023.

  30. A. P. Vezhnevets, ‘‘Methods of supervised classification by precedents in taks of object recognition in images,’’ (Lab. Komp’yut. Grafiki i Mul’timedia Fakul’teta VMiK, Mosk. Gos. Univ. im. M.V. Lomonosova, Moscow, 2006). http://www.graphicon.ru/2006/fr10_34_VezhnevetsA.pdf. Cited January 27, 2023.

  31. V. Badrinarayanan, A. Kendall, and R. Cipolla, ‘‘SegNet: A deep convolutional encoder-decoder architecture for image segmentation,’’ IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/tpami.2016.2644615

    Article  Google Scholar 

  32. O. Cicek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, ‘‘U-Net: Learning dense volumetric segmentation from sparse annotation,’’ in Medical Image Computing and Computer-Assisted Intervention (MICCAI 2016), Ed. by S. Ourselin, L. Joskowicz, M. Sabuncu, G. Unal, and W. Wells (Springer, Cham, 2016), pp. 424–432. https://doi.org/10.1007/978-3-319-46723-8_49

  33. V. I. Kozik and E. S. Nezhevenko, ‘‘Classification of hyperspectral images using conventional neural networks,’’ Optoelectron., Instrum. Data Process. 57, 123–131 (2021). https://doi.org/10.3103/S8756699021020102

    Article  ADS  Google Scholar 

  34. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learning applied to document recognition,’’ Proc. IEEE 86, 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  35. A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification with deep convolutional neural networks,’’ Commun. ACM 60 (6), 84–90 (2012). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  36. G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, ‘‘Improving neural networks by preventing co-adaptation of feature detectors,’’ (2012). https://doi.org/10.48550/ar**v.1207.0580

  37. S. M. Borzov, A. V. Karpov, O. I. Potaturkin, and A. O. Hadziev, ‘‘Application of neural networks for differential diagnosis of pulmonary pathologies based on X-ray images,’’ Optoelectron., Instrum. Data Process. 58, 257–265 (2022). https://doi.org/10.3103/S8756699022030013

    Article  ADS  Google Scholar 

  38. A. V. Karpov, V. I. Kozik, E. S. Nezhevenko, and Y. Sh. Schwartz, ‘‘On the influence of the quality of databases of X-Ray images of patients with tuberculosis on the diagnostics of deceases,’’ Optoelectron., Instrum. Data Process. 58, 487–494 (2022). https://doi.org/10.3103/S8756699022050065

    Article  ADS  Google Scholar 

  39. K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for large-scale image recognition,’’ (2014). https://doi.org/10.48550/ar**v.1409.1556

  40. Milyutin, I., ‘‘VGG16 is a convolutional neural network for extracting attributes of images,’’ https://neurohive.io/ru/vidy-nejrosetej/vgg16-model/?. Cited January 27, 2023.

  41. M. Lin, Q. Chen, and Sh. Yan, ‘‘Network in network,’’ (2014). https://doi.org/10.48550/ar**v.1312.4400

  42. K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image recognition,’’ (2015). https://doi.org/10.48550/ar**v.1512.03385

  43. C. Szegedy, W. Liu, Ya. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ in 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, 2015 (IEEE, 2015), pp. 1–9. https://doi.org/10.1109/cvpr.2015.7298594

  44. A. Veit, M. Wilber, and S. Belongie, ‘‘Residual networks behave like ensembles of relatively shallow networks,’’ (2016). https://doi.org/10.48550/ar**v.1605.06431

  45. G. Huang, Zh. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely connected convolutional networks,’’ in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2261–2269. https://doi.org/10.1109/cvpr.2017.243

  46. M. Tan and Q. V. Le, ‘‘EfficientNet: Rethinking model scaling for convolutional neural networks. machine learning,’’ (2020). https://doi.org/10.48550/ar**v.1905.11946

  47. M. Tan, R. Pang, and Q. V. Le, ‘‘EfficientDet: Scalable and efficient object detection,’’ in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020 (IEEE, 2020), pp. 10778–10787. https://doi.org/10.1109/cvpr42600.2020.01079

  48. P. Ramachandran, B. Zoph, and Q. V. Le, ‘‘Searching for activation functions,’’ in Proc. of the Int. Conf. on Learning Representations (ICLR Workshop), Vancouver, 2018 (2018). https://openreview.net/forum?id=SkBYYyZRZ. Cited January 27, 2023.

  49. Yu. Chen, T. Yang, X. Zhang, G. Meng, X. **ao, and J. Sun, ‘‘DetNAS: Backbone search for object detection,’’ 596 (2019).

  50. T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in Computer Vision–ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Springer, Cham, 2014), pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  51. L. Yao, H. Xu, W. Zhang, X. Liang, and Zh. Li, ‘‘SM-NAS: Structural-to-modular neural architecture search for object detection,’’ Proc. AAAI Conf. Artif. Intell. 34, 12661–12668 (2020). https://doi.org/10.1609/aaai.v34i07.6958

  52. V. D. Nogin, Pareto Set and Principle (Izdatel’sko-Poligraficheskaya Assotsiatsiya Vysshikh Uchebnykh Zavedenii, 2022).

    Google Scholar 

  53. Yo. Freund and R. E. Schapire, ‘‘A short introduction to boosting,’’ J. Jpn. Soc. Artif. Intell. 14, 771–780 (1999).

    Google Scholar 

  54. J. Sochman and J. Matas, AdaBoost. Center for Machine Perception (Czech Tech. Univ., Prague, 2010). https://cmp.felk.cvut.cz/simsochmj1/adaboost_talk.pdf. Cited January 27, 2023.

  55. L. V. Utkin and M. A. Ryabinin, ‘‘A Siamese deep forest,’’ Knowl.-Based Syst. 139, 13–22 (2018). https://doi.org/10.1016/j.knosys.2017.10.006

    Article  Google Scholar 

  56. R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature hierarchies for accurate object detection and semantic segmentation,’’ in 2014 IEEE Conf. on Computer Vision and Pattern Recognition (IEEE, Columbus, Ohio, 2014, 2014), pp. 580–587. https://doi.org/10.1109/cvpr.2014.81

  57. R. Girshick, ‘‘Fast R-CNN,’’ in 2015 IEEE Int. Conf. on Computer Vision (ICCV) (IEEE, Santiago, Chile, 2015, 2015), pp. 1440–1448. https://doi.org/10.1109/iccv.2015.169

  58. S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time object detection with region proposal networks,’’ IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015). https://doi.org/10.1109/tpami.2016.2577031

    Article  Google Scholar 

  59. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You Only Look Once: Unified, Real-Time Object Detection,’’ in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (IEEE, Las Vegas, 2016, 2016), pp. 779–788. https://doi.org/10.1109/cvpr.2016.91

  60. J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’ (2018). https://doi.org/10.48550/ar**v.1804.02767

  61. A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal speed and accuracy of object detection,’’ (2020). https://doi.org/10.48550/ar**v.2004.10934

  62. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, ‘‘SSD: Single shot multibox detector,’’ in Computer Vision–ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling (Springer, Cham, 2016), pp. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

Download references

Funding

The research was carried out within the state assignment of Ministry of Science and Higher Education of the Russian Federation (project no. 121022000116-0) at the Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to S. M. Borzov or E. S. Nezhevenko.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by L. Trubitsyna

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Borzov, S.M., Nezhevenko, E.S. Neural Network Technologies for Detection and Classification of Objects. Optoelectron.Instrument.Proc. 59, 329–345 (2023). https://doi.org/10.3103/S8756699023030032

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S8756699023030032

Keywords:

Navigation