Abstract
The research concerns the use of machine learning algorithms for multiscale coding of digital video sequences. Based on machine learning, the digital image coder is generalized to the coding of video sequences. To this end, we offer an algorithm that allows for videoframes interdependency by using linear regression. The generalized image coder uses multiscale representation of videoframes, neural network three-dimensional interpolation of multiscale videoframe interpretation levels and generative-adversarial neural net replacement of homogeneous portions of a videoframe by synthetic video data. The method of coding the entire video and method of coding videoframes are exemplified by block diagrams. Formalized description of how videoframe correlation is taken into account is given. Real video sequences are used to carry out numerical experiments. The experimental data allow us to make a conclusion about the promise of using the algorithm in video coding and processing.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS1060992X23030037/MediaObjects/12005_2023_5178_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS1060992X23030037/MediaObjects/12005_2023_5178_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS1060992X23030037/MediaObjects/12005_2023_5178_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS1060992X23030037/MediaObjects/12005_2023_5178_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS1060992X23030037/MediaObjects/12005_2023_5178_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS1060992X23030037/MediaObjects/12005_2023_5178_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.3103%2FS1060992X23030037/MediaObjects/12005_2023_5178_Fig7_HTML.png)
Similar content being viewed by others
REFERENCES
Hoang, T.M. and Zhou, J., Recent trending on learning based video compression: A survey, Cognit. Rob., 2021, vol. 1, pp. 145–158.
Yasin, H.M. and Ameen, S.Y., Review and evaluation of end-to-end video compression with deep-learning, in 2021 International Conference of Modern Trends in Information and Communication Technology Industry (MTICTI), IEEE, 2021, pp. 1–8.
Saideni, W., Helbert, D., Courreges, F., and Cances, J.P., An overview on deep learning techniques for video compressive sensing, Appl. Sci., 2022, vol. 12, no. 5, p. 2734.
Chen, Z., Lu, G., Hu, Z., Liu, S., Jiang, W., and Xu, D., LSVC: A learning-based stereo video compression framework, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6073–6082.
Mandhane, A., Zhernov, A., Rauh, M., Gu, C., Wang, M., Xue, F., … and Mann, T., Muzero with self-competition for rate control in vp9 video compression, 2022. ar**v preprint ar**v:2202.06626.
Chen, M.J., Lee, C.A., Tsai, Y.H., Yang, C.M., Yeh, C.H., Kau, L.J., and Chang, C.Y., Efficient partition decision based on visual perception and machine learning for H. 266/Versatile video coding, IEEE Access, 2022, vol. 10, pp. 42141–42150.
Mentzer, F., Toderici, G., Minnen, D., Hwang, S.J., Caelles, S., Lucic, M., and Agustsson, E., Vct: A video compression transformer, 2022. ar**v preprint ar**v:2206.07307.
Zhang, Q., Wang, S., Zhang, X., Jia, C., Pan, J., Ma, S., and Gao, W., SMR: Satisfied Machine Ratio Modeling for Machine Recognition-Oriented Image and Video Compression, 2022. ar**v preprint ar**v:2211.06797.
Duong, L.R., Li, B., Chen, C., and Han, J., Multi-rate adaptive transform coding for video compression, 2022. ar**v preprint ar**v:2210.14308.
Gashnikov, M.V., Use of neural networks and decision trees in compression of 2D and 3D digital signals, Opt. Mem. Neural Networks, 2022, vol. 31, no. 4, pp. 379–392.
Sergeyev, V.V, Glumov, N.I., and Gashnikov, M.V., Compression rate control during hierarchical image compression, 7th Int. Conference on Pattern Recognition and image analysis: New Information Technologies, 2004, vol. 1, pp. 217–219.
Dynamic Scenes Data Set. http://vision.eecs.yorku.ca/research/dynamic-scenes.
Funding
The research was supported by the Russian Science Fund, project no. 22‑21‑00662.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The author declares that he has no conflicts of interest.
About this article
Cite this article
Gashnikov, M.V. Machine Learning for Multiscale Video Coding. Opt. Mem. Neural Networks 32, 189–196 (2023). https://doi.org/10.3103/S1060992X23030037
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1060992X23030037