Deformable Spatial-Temporal Attention for Lightweight Video Super-Resolution

Xue, Tong; Huang, **nyi; Li, Dengshi

doi:10.1007/978-981-99-8549-4_40

Tong Xue¹⁵,
**nyi Huang¹⁵ &
Dengshi Li ORCID: orcid.org/0000-0002-3349-8664¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14434))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

559 Accesses

Abstract

Video super-resolution (VSR) aims to recover high-resolution video frames from their corresponding low-resolution video frames and their adjacent consecutive frames. Although some progress has been made, most existing methods typically use the spatial-temporal information of two adjacent reference frames to aid in enhancing the video frame super-resolution reconstruction effect. This makes it impossible for these methods to achieve satisfactory results. To solve this problem. We propose a deformable spatial-temporal attention (DSTA) module for video super-resolution. The deformable spatial-temporal attention module improves the reconstruction effect by aggregating favorable spatial-temporal information from multiple reference frames into the current frame. To speed up the model training, we select only the first s highly relevant feature points as the attention scheme. Experimental results show that our method with fewer network parameters has strong video super-resolution performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bidirectional Multi-scale Deformable Attention for Video Super-Resolution

Article 21 August 2023

Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer

Article Open access 15 December 2022

Dual Attention with the Self-Attention Alignment for Efficient Video Super-resolution

Article 15 May 2021

References

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: a video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6836–6846 (2021)
Google Scholar
Caballero, J., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787 (2017)
Google Scholar
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: BasicVSR: the search for essential components in video super-resolution and beyond. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4947–4956 (2021)
Google Scholar
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: Understanding deformable alignment in video super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 973–981 (2021)
Google Scholar
Chen, J., Tan, X., Shan, C., Liu, S., Chen, Z.: VESR-Net: the winning solution to Youku video enhancement and super-resolution challenge. ar**v e-prints, pp. ar**v-2003 (2020)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
Article Google Scholar
Haris, M., Shakhnarovich, G., Ukita, N.: Deep back-projection networks for super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1664–1673 (2018)
Google Scholar
Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019)
Google Scholar
Isobe, T., Jia, X., Gu, S., Li, S., Wang, S., Tian, Q.: Video super-resolution with recurrent structure-detail network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 645–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_38
Chapter Google Scholar
Jo, Y., Oh, S.W., Kang, J., Kim, S.J.: Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3224–3232 (2018)
Google Scholar
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)
Google Scholar
Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., Wu, W.: Feedback network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3867–3876 (2019)
Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using Swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)
Google Scholar
Liu, C., Sun, D.: On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 346–360 (2013)
Article Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4161–4170 (2017)
Google Scholar
Sun, Y., Chen, J., Liu, Q., Liu, G.: Learning image compressed sensing with sub-pixel convolutional generative adversarial network. Pattern Recogn. 98, 107051 (2020)
Article Google Scholar
Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4472–4480 (2017)
Google Scholar
Tian, Y., Zhang, Y., Fu, Y., Tdan, C.X.: Temporally-deformable alignment network for video super-resolution. In: 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3357–3366 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(2019)
Google Scholar
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127, 1106–1125 (2019)
Article Google Scholar
Yi, P., Wang, Z., Jiang, K., Jiang, J., Ma, J.: Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3106–3115 (2019)
Google Scholar
Ying, X., Wang, L., Wang, Y., Sheng, W., An, W., Guo, Y.: Deformable 3d convolution for video super-resolution. IEEE Signal Process. Lett. 27, 1500–1504 (2020)
Article Google Scholar
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2472–2481 (2018)
Google Scholar
Zhou, Y., Wu, G., Fu, Y., Li, K., Liu, Y.: Cross-MPI: cross-scale stereo for image super-resolution using multiplane images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14842–14851 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, Jianghan University, Wuhan, 430056, China
Tong Xue, **nyi Huang & Dengshi Li

Authors

Tong Xue
View author publications
You can also search for this author in PubMed Google Scholar
**nyi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dengshi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dengshi Li .

Editor information

Editors and Affiliations

Nan**g University of Information Science and Technology, Nan**g, China
Qingshan Liu
**amen University, **amen, China
Hanzi Wang
Bei**g University of Posts and Telecommunications, Bei**g, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Bei**g, China
Hongbin Zha
Chinese Academy of Sciences, Bei**g, China
**lin Chen
Chinese Academy of Sciences, Bei**g, China
Liang Wang
**amen University, **amen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, T., Huang, X., Li, D. (2024). Deformable Spatial-Temporal Attention for Lightweight Video Super-Resolution. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14434. Springer, Singapore. https://doi.org/10.1007/978-981-99-8549-4_40

Download citation

DOI: https://doi.org/10.1007/978-981-99-8549-4_40
Published: 25 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8548-7
Online ISBN: 978-981-99-8549-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deformable Spatial-Temporal Attention for Lightweight Video Super-Resolution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bidirectional Multi-scale Deformable Attention for Video Super-Resolution

Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer

Dual Attention with the Self-Attention Alignment for Efficient Video Super-resolution

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Deformable Spatial-Temporal Attention for Lightweight Video Super-Resolution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bidirectional Multi-scale Deformable Attention for Video Super-Resolution

Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer

Dual Attention with the Self-Attention Alignment for Efficient Video Super-resolution

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation