Abstract
We present a review of the basic ideas used in solving the problems of detecting and classifying objects by their images using neural network technologies. The key publications on the most popular ways to improve classification accuracy are considered. It is shown that in the last decade, neural network methods for detecting objects have achieved significant success by using convolution technologies and applying deep learning with large databases. The main shortcomings, limitations and possible directions for the improvement of existing approaches are analyzed.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS8756699023030032/MediaObjects/11974_2023_8269_Fig15_HTML.png)
REFERENCES
Methods of Computer Image Processing, Ed. by V. A. Soifer (Fizmatlit, Moscow, 2003).
A. A. Luk’yanitsa and A. G. Shishkin, Digital Video Image Processing (Ai-Es-Es Press, Moscow, 2009).
D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach (Prentice Hall, 2002).
G. Stockman and L. G. Shapiro, Computer Vision (Prentice Hall, 2006).
I. S. Gruzman, V. S. Kirichuk, V. P. Kosykh, G. I. Peretyagin, and A. A. Spektor, Digital Image Processing in Information Systems (Novosibirsk. Gos. Tekh. Univ., Novosibirsk, 2002).
Yu. I. Zhuravlev, V. V. Ryazanov, and O. V. Sen’ko, Mathematical Methods: Program System: Practical Applications (Fazis, Moscow, 2006).
P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of simple features,’’ in Proc. 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. CVPR 2001, Kanai, Hawaii, 2001 (IEEE, 2001), pp. 511–518. https://doi.org/10.1109/cvpr.2001.990517
N. Dalal and B. Triggs, ‘‘Histograms of oriented gradients for human detection,’’ in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, Calif., 2005 (IEEE, 2005), pp. 886–893. https://doi.org/10.1109/cvpr.2005.177
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: A large-scale hierarchical image database,’’ in 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009 (IEEE, 2009), pp. 248–255. https://doi.org/10.1109/cvpr.2009.5206848
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and L. Fei-Fei, ‘‘ImageNet Large Scale Visual Recognition Challenge,’’ Int. J. Comput. Vision 115, 211-252 (2015). https://doi.org/10.1007/s11263-015-0816-y
M. Everingham, S. Eslami, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, ‘‘The Pascal Visual Object Classes Challenge: A retrospective,’’ Int. J. Comput. Vision 111, 98-136 (2015). https://doi.org/10.1007/s11263-014-0733-5
S. Mann, ‘‘Glasseyes: The theory of eyetap digital eye glass,’’ New Engl. J. Med. 31 (3), 10–14 (2012). http://wearcam.org/glass.pdf. Cited January 17, 2022.
Development Edition, Microsoft official site, (2016). https://www.microsoft.com/microsoft-hololens/en-us. Cited November 19, 2021.
Meet Kinect for Windows, Microsoft official site, (2016). https://dev.windows.com/en-us/kinect. Cited November 19, 2021.
PyTorch, https://pytorch.org/. Cited January 14, 2022.
Vuforia 5.5 SDK, Vuforia Developer Portal, (2016). https://developer.vuforia.com/downloads/sdk. Cited November 12, 2021.
Kudan SDK 1.2.3 version, Kudan Augmented Reality, (2016). https://www.kudan.eu/download/. Cited November 17, 2021.
D.-H. Kenneth, A Practical Introduction to Computer Vision with OpenCV (Trinity College Dublin, Dublin, 2014).
Z. Zou, K. Chen, Z. Shi, Yu. Guo, and J. Ye, ‘‘Object detection in 20 years: A survey,’’ Proc. IEEE 111, 257–263 (2023). https://doi.org/10.1109/JPROC.2023.3238524
N. A. Andriyanov, V. E. Dement’ev, and A. G. Tashlinskii, ‘‘Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks,’’ Komp’yuternaya Opt. 46, 139–159 (2022). https://doi.org/10.18287/2412-6179-CO-922
Tensor Flow 2 detection zoo, https://github.com/tensorflow/models/blob/master/research/object_detec- tion/g3doc/tf2_detection_zoo.md. Cited January 27, 2023.
The Neural Network Zoo, https://www.asimovinstitute.org/neural-network-zoo/. Cited January 27, 2023.
P. D. Wassermen, Neural Computing: Theory and Practice (Van Nostrand Reinhold, 1989).
‘‘Areas of application of neural networks: Classification of neural networks: Review and analysis of neural networks,’’ https://studbooks.net/2030598/informatika/oblasti_primeneniya_neyronnyh_setey. Cited January 27, 2023.
S. G. Nikolaeva, Neural Networks: MATLAB Implementation: Textbook (Kazan. Gos. Energ. Univ., Kazan, 2015).
A. I. Galushkin, Synthesis of Multilayer Pattern Recognition Systems (Energiya, Moscow, 1974).
P. J. Werbos, ‘‘Beyond regression: New tools for prediction and analysis in the behavioral sciences,’’ in PhD Thesis (Harvard Univ., Cambridge, 1974), pp. 453.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning internal representations by error propagation,’’ in Parallel Distributed Processing (MIT Press, Cambridge, 1986), pp. 318-362.
CIFAR-10 and CIFAR-100 datasets, https://www.cs.toronto.edu/kriz/cifar.html. Cited January 27, 2023.
A. P. Vezhnevets, ‘‘Methods of supervised classification by precedents in taks of object recognition in images,’’ (Lab. Komp’yut. Grafiki i Mul’timedia Fakul’teta VMiK, Mosk. Gos. Univ. im. M.V. Lomonosova, Moscow, 2006). http://www.graphicon.ru/2006/fr10_34_VezhnevetsA.pdf. Cited January 27, 2023.
V. Badrinarayanan, A. Kendall, and R. Cipolla, ‘‘SegNet: A deep convolutional encoder-decoder architecture for image segmentation,’’ IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/tpami.2016.2644615
O. Cicek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, ‘‘U-Net: Learning dense volumetric segmentation from sparse annotation,’’ in Medical Image Computing and Computer-Assisted Intervention (MICCAI 2016), Ed. by S. Ourselin, L. Joskowicz, M. Sabuncu, G. Unal, and W. Wells (Springer, Cham, 2016), pp. 424–432. https://doi.org/10.1007/978-3-319-46723-8_49
V. I. Kozik and E. S. Nezhevenko, ‘‘Classification of hyperspectral images using conventional neural networks,’’ Optoelectron., Instrum. Data Process. 57, 123–131 (2021). https://doi.org/10.3103/S8756699021020102
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learning applied to document recognition,’’ Proc. IEEE 86, 2278–2324 (1998). https://doi.org/10.1109/5.726791
A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification with deep convolutional neural networks,’’ Commun. ACM 60 (6), 84–90 (2012). https://doi.org/10.1145/3065386
G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, ‘‘Improving neural networks by preventing co-adaptation of feature detectors,’’ (2012). https://doi.org/10.48550/ar**v.1207.0580
S. M. Borzov, A. V. Karpov, O. I. Potaturkin, and A. O. Hadziev, ‘‘Application of neural networks for differential diagnosis of pulmonary pathologies based on X-ray images,’’ Optoelectron., Instrum. Data Process. 58, 257–265 (2022). https://doi.org/10.3103/S8756699022030013
A. V. Karpov, V. I. Kozik, E. S. Nezhevenko, and Y. Sh. Schwartz, ‘‘On the influence of the quality of databases of X-Ray images of patients with tuberculosis on the diagnostics of deceases,’’ Optoelectron., Instrum. Data Process. 58, 487–494 (2022). https://doi.org/10.3103/S8756699022050065
K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for large-scale image recognition,’’ (2014). https://doi.org/10.48550/ar**v.1409.1556
Milyutin, I., ‘‘VGG16 is a convolutional neural network for extracting attributes of images,’’ https://neurohive.io/ru/vidy-nejrosetej/vgg16-model/?. Cited January 27, 2023.
M. Lin, Q. Chen, and Sh. Yan, ‘‘Network in network,’’ (2014). https://doi.org/10.48550/ar**v.1312.4400
K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image recognition,’’ (2015). https://doi.org/10.48550/ar**v.1512.03385
C. Szegedy, W. Liu, Ya. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ in 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, 2015 (IEEE, 2015), pp. 1–9. https://doi.org/10.1109/cvpr.2015.7298594
A. Veit, M. Wilber, and S. Belongie, ‘‘Residual networks behave like ensembles of relatively shallow networks,’’ (2016). https://doi.org/10.48550/ar**v.1605.06431
G. Huang, Zh. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely connected convolutional networks,’’ in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2261–2269. https://doi.org/10.1109/cvpr.2017.243
M. Tan and Q. V. Le, ‘‘EfficientNet: Rethinking model scaling for convolutional neural networks. machine learning,’’ (2020). https://doi.org/10.48550/ar**v.1905.11946
M. Tan, R. Pang, and Q. V. Le, ‘‘EfficientDet: Scalable and efficient object detection,’’ in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2020 (IEEE, 2020), pp. 10778–10787. https://doi.org/10.1109/cvpr42600.2020.01079
P. Ramachandran, B. Zoph, and Q. V. Le, ‘‘Searching for activation functions,’’ in Proc. of the Int. Conf. on Learning Representations (ICLR Workshop), Vancouver, 2018 (2018). https://openreview.net/forum?id=SkBYYyZRZ. Cited January 27, 2023.
Yu. Chen, T. Yang, X. Zhang, G. Meng, X. **ao, and J. Sun, ‘‘DetNAS: Backbone search for object detection,’’ 596 (2019).
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in Computer Vision–ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Springer, Cham, 2014), pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
L. Yao, H. Xu, W. Zhang, X. Liang, and Zh. Li, ‘‘SM-NAS: Structural-to-modular neural architecture search for object detection,’’ Proc. AAAI Conf. Artif. Intell. 34, 12661–12668 (2020). https://doi.org/10.1609/aaai.v34i07.6958
V. D. Nogin, Pareto Set and Principle (Izdatel’sko-Poligraficheskaya Assotsiatsiya Vysshikh Uchebnykh Zavedenii, 2022).
Yo. Freund and R. E. Schapire, ‘‘A short introduction to boosting,’’ J. Jpn. Soc. Artif. Intell. 14, 771–780 (1999).
J. Sochman and J. Matas, AdaBoost. Center for Machine Perception (Czech Tech. Univ., Prague, 2010). https://cmp.felk.cvut.cz/simsochmj1/adaboost_talk.pdf. Cited January 27, 2023.
L. V. Utkin and M. A. Ryabinin, ‘‘A Siamese deep forest,’’ Knowl.-Based Syst. 139, 13–22 (2018). https://doi.org/10.1016/j.knosys.2017.10.006
R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature hierarchies for accurate object detection and semantic segmentation,’’ in 2014 IEEE Conf. on Computer Vision and Pattern Recognition (IEEE, Columbus, Ohio, 2014, 2014), pp. 580–587. https://doi.org/10.1109/cvpr.2014.81
R. Girshick, ‘‘Fast R-CNN,’’ in 2015 IEEE Int. Conf. on Computer Vision (ICCV) (IEEE, Santiago, Chile, 2015, 2015), pp. 1440–1448. https://doi.org/10.1109/iccv.2015.169
S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time object detection with region proposal networks,’’ IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015). https://doi.org/10.1109/tpami.2016.2577031
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You Only Look Once: Unified, Real-Time Object Detection,’’ in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (IEEE, Las Vegas, 2016, 2016), pp. 779–788. https://doi.org/10.1109/cvpr.2016.91
J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’ (2018). https://doi.org/10.48550/ar**v.1804.02767
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal speed and accuracy of object detection,’’ (2020). https://doi.org/10.48550/ar**v.2004.10934
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, ‘‘SSD: Single shot multibox detector,’’ in Computer Vision–ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling (Springer, Cham, 2016), pp. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Funding
The research was carried out within the state assignment of Ministry of Science and Higher Education of the Russian Federation (project no. 121022000116-0) at the Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Translated by L. Trubitsyna
About this article
Cite this article
Borzov, S.M., Nezhevenko, E.S. Neural Network Technologies for Detection and Classification of Objects. Optoelectron.Instrument.Proc. 59, 329–345 (2023). https://doi.org/10.3103/S8756699023030032
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S8756699023030032