Log in

Machine Learning for Multiscale Video Coding

  • Published:
Optical Memory and Neural Networks Aims and scope Submit manuscript

Abstract

The research concerns the use of machine learning algorithms for multiscale coding of digital video sequences. Based on machine learning, the digital image coder is generalized to the coding of video sequences. To this end, we offer an algorithm that allows for videoframes interdependency by using linear regression. The generalized image coder uses multiscale representation of videoframes, neural network three-dimensional interpolation of multiscale videoframe interpretation levels and generative-adversarial neural net replacement of homogeneous portions of a videoframe by synthetic video data. The method of coding the entire video and method of coding videoframes are exemplified by block diagrams. Formalized description of how videoframe correlation is taken into account is given. Real video sequences are used to carry out numerical experiments. The experimental data allow us to make a conclusion about the promise of using the algorithm in video coding and processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Similar content being viewed by others

REFERENCES

  1. Hoang, T.M. and Zhou, J., Recent trending on learning based video compression: A survey, Cognit. Rob., 2021, vol. 1, pp. 145–158.

    Google Scholar 

  2. Yasin, H.M. and Ameen, S.Y., Review and evaluation of end-to-end video compression with deep-learning, in 2021 International Conference of Modern Trends in Information and Communication Technology Industry (MTICTI), IEEE, 2021, pp. 1–8.

  3. Saideni, W., Helbert, D., Courreges, F., and Cances, J.P., An overview on deep learning techniques for video compressive sensing, Appl. Sci., 2022, vol. 12, no. 5, p. 2734.

    Article  Google Scholar 

  4. Chen, Z., Lu, G., Hu, Z., Liu, S., Jiang, W., and Xu, D., LSVC: A learning-based stereo video compression framework, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6073–6082.

  5. Mandhane, A., Zhernov, A., Rauh, M., Gu, C., Wang, M., Xue, F., … and Mann, T., Muzero with self-competition for rate control in vp9 video compression, 2022. ar**v preprint ar**v:2202.06626.

  6. Chen, M.J., Lee, C.A., Tsai, Y.H., Yang, C.M., Yeh, C.H., Kau, L.J., and Chang, C.Y., Efficient partition decision based on visual perception and machine learning for H. 266/Versatile video coding, IEEE Access, 2022, vol. 10, pp. 42141–42150.

    Article  Google Scholar 

  7. Mentzer, F., Toderici, G., Minnen, D., Hwang, S.J., Caelles, S., Lucic, M., and Agustsson, E., Vct: A video compression transformer, 2022. ar**v preprint ar**v:2206.07307.

  8. Zhang, Q., Wang, S., Zhang, X., Jia, C., Pan, J., Ma, S., and Gao, W., SMR: Satisfied Machine Ratio Modeling for Machine Recognition-Oriented Image and Video Compression, 2022. ar**v preprint ar**v:2211.06797.

  9. Duong, L.R., Li, B., Chen, C., and Han, J., Multi-rate adaptive transform coding for video compression, 2022. ar**v preprint ar**v:2210.14308.

  10. Gashnikov, M.V., Use of neural networks and decision trees in compression of 2D and 3D digital signals, Opt. Mem. Neural Networks, 2022, vol. 31, no. 4, pp. 379–392.

    Article  Google Scholar 

  11. Sergeyev, V.V, Glumov, N.I., and Gashnikov, M.V., Compression rate control during hierarchical image compression, 7th Int. Conference on Pattern Recognition and image analysis: New Information Technologies, 2004, vol. 1, pp. 217–219.

  12. Dynamic Scenes Data Set. http://vision.eecs.yorku.ca/research/dynamic-scenes.

Download references

Funding

The research was supported by the Russian Science Fund, project no. 22‑21‑00662.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. V. Gashnikov.

Ethics declarations

The author declares that he has no conflicts of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gashnikov, M.V. Machine Learning for Multiscale Video Coding. Opt. Mem. Neural Networks 32, 189–196 (2023). https://doi.org/10.3103/S1060992X23030037

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S1060992X23030037

Key words:

Navigation