Lightweight network with masks for light field image super-resolution based on swin attention

Wang, **ngzheng; Wu, Shaoyong; Li, Jiahui; Wu, Jianbin

doi:10.1007/s11042-024-18588-z

Lightweight network with masks for light field image super-resolution based on swin attention

Published: 29 February 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

**ngzheng Wang ORCID: orcid.org/0000-0002-5433-3631¹,
Shaoyong Wu¹,
Jiahui Li¹ &
…
Jianbin Wu¹

189 Accesses
Explore all metrics

Abstract

Light field (LF) image super-resolution (SR) is a technique designed to enhance the details and clarity of low-resolution (LR) light field images by leveraging the additional information and structure present within the LF data. With the rise of deep learning, the performance of LF image super-resolution has been significantly improved, but it has led to an increase in model parameters and computational complexity, resulting in a phenomenon of excessive reliance on computational resources. To address this problem, this paper proposes a lightweight but effective model. In our approach, we employ Swin Attention to extract features from LF images. Window and Shifted Window are the main components of Swin Attention. Therefore, the local features of the LF images are captured using Window, and feature correlations are established through Shifted Window. Furthermore, we introduce Extensive Attention (EA) blocks to capture the global features of the LF image. In addition to the aforementioned configurations, we have also engineered an iteration of the Low Computation Convolution (LCC) that is capable of eliminating redundant information prior to the feature extraction process in LF images. This design aims to mitigate superfluous computations, thereby enhancing computational efficiency. Experimental results show that our approach achieves suboptimal performance compared to other state-of-the-art models, while having fewer parameters, lower computational complexity, and faster inference speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Thailand)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Hierarchical feature fusion network for light field spatial super-resolution

Article 03 February 2022

Learning a Deep Fourier Attention Generative Adversarial Network for Light Field Image Super-Resolution

Light Field Image Super-Resolution via Global-View Information Adaption and Angular Attention Fusion

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly available due to the excessive size but are available from the corresponding author on reasonable request.

References

Balzer W, Takahashi M, Ohta J et al (1991) Weight quantization in boltzmann machines. Neural Netw 4(3):405–409
Article Google Scholar
Beal J, Kim E, Tzeng E et al (2020) Toward transformer-based object detection. ar**v:2012.09958
Bhojanapalli S, Chakrabarti A, Glasner D et al (2021) Understanding robustness of transformers for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10,231–10,241
Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Google Scholar
Cao H, Wang Y, Chen J et al (2022) Swin-unet: unet-like pure transformer for medical image segmentation. In: European conference on computer vision. Springer, pp 205–218
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Chen CFR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 357–366
Chen H, Wang Y, Guo T et al (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12,299–12,310
Chen J, Zhang S, Lin Y (2021) Attention-based multi-level fusion network for light field depth estimation. In: Proceedings of the AAAI conference on artificial intelligence, pp 1009–1017
Cheng Z, **ong Z, Chen C et al (2021) Light field super-resolution with zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,010–10,019
Cong R, Sheng H, Yang D et al (2023) Exploiting spatial and angular correlations with deep efficient transformers for light field image super-resolution. IEEE Transactions on Multimedia
Devlin J, Chang MW, Lee K et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805
Ding Y, Chen Z, Ji Y et al (2023) Light field-based underwater 3d reconstruction via angular resampling. IEEE Transactions on Computational Imaging
Dong C, Loy CC, He K et al (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. ar**v:2010.11929
Fan H, Liu D, **ong Z et al (2017) Two-stage convolutional neural network for light field super-resolution. In: 2017 IEEE International conference on image processing (ICIP). IEEE, pp 1167–1171
Gao L, Zhang J, Yang C et al (2022) Cas-vswin transformer: a variant swin transformer for surface-defect detection. Comput Ind 140:103,689
Article Google Scholar
Gehrig M, Scaramuzza D (2023) Recurrent vision transformers for object detection with event cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13,884–13,893
Guo X, Sang X, Chen D et al (2021) Real-time optical reconstruction for a three-dimensional light-field display based on path-tracing and cnn super-resolution. Optics Express 29(23):37,862–37,876
Han K, Wang Y, Chen H et al (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110
Article PubMed Google Scholar
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). ar**v:1606.08415
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. ar**v:1503.02531
Honauer K, Johannsen O, Kondermann D et al (2017) A dataset and evaluation methodology for depth estimation on 4d light fields. In: Computer vision–ACCV 2016: 13th Asian conference on computer vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part III 13. Springer, pp 19–34
Huang J, Fang Y, Wu Y et al (2022) Swin transformer for fast mri. Neurocomputing 493:281–304
Article Google Scholar
** J, Hou J, Chen J et al (2020) Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2260–2269
Kim J, Lee JK, Lee KM (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980
Ko K, Koh YJ, Chang S et al (2021) Light field super-resolution via adaptive feature remixing. IEEE Trans Image Process 30:4114–4128
Article PubMed ADS Google Scholar
Le Pendu M, Jiang X, Guillemot C (2018) Light field inpainting propagation via low rank matrix completion. IEEE Trans Image Process 27(4):1981–1993
Article MathSciNet ADS Google Scholar
Liang J, Cao J, Sun G, et al (2021) Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844
Liang Z, Wang Y, Wang L et al (2022) Light field image super-resolution with transformers. IEEE Signal Process Lett 29:563–567
Article ADS Google Scholar
Liang Z, Wang Y, Wang L et al (2023) Learning non-local spatial-angular correlation for light field image super-resolution. ar**v:2302.08058
Liao W, Bai X, Zhang Q et al (2023) Decoupled and reparameterized compound attention-based light field depth estimation network. IEEE Access 11:130,119–130,130
Lim B, Son S, Kim H et al (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144
Liu G, Yue H, Wu J et al (2021) Intra-inter view interaction network for light field image super-resolution. IEEE Transactions on Multimedia
Liu R, Lehman J, Molino P et al (2018) An intriguing failing of convolutional neural networks and the coordconv solution. Adv Neural Inf Process Syst 31
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10,012–10,022
Liu Z, Ning J, Cao Y et al (2022) Video swin transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3202–3211
Ma D, Lumsdaine A, Zhou W (2020) Flexible spatial and angular light field super resolution. In: 2020 IEEE International conference on image processing (ICIP). IEEE, pp 2970–2974
Maas AL, Hannun AY, Ng AY et al (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, Atlanta, GA, p 3
McGriff H, Martins R, Andreff N et al (2024) Joint 3d shape and motion estimation from rolling shutter light-field images. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3699–3708
Meng N, Ge Z, Zeng T et al (2020) Lightgan: a deep generative model for light field reconstruction. IEEE Access 8:116,052–116,063
Qu Q, Chen X, Chung YY et al (2023) Lfacon: introducing anglewise attention to no-reference quality assessment in light field space. IEEE Trans Visual Comput Graphics 29(5):2239–2248
Article Google Scholar
Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Rerabek M, Ebrahimi T (2016) New light field image dataset. In: 8th International conference on quality of multimedia experience (QoMEX), CONF
Sha Y, Zhang Y, Ji X et al (2021) Transformer-unet: raw image processing with unet. ar**v:2109.08417
Srinivas S, Babu RV (2015) Data-free parameter pruning for deep neural networks. ar**v:1507.06149
Strudel R, Garcia R, Laptev I et al (2021) Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7262–7272
Vaish V, Adams A (2008) The (new) stanford light field archive. Computer Graphics Laboratory, Stanford University 6(7):3
Google Scholar
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang BH, Ma YG, Cao Y (2023) A brief introduction to organic electrodeposition and a review of the fabrication of oleds based on electrodeposition technology. Chin J Polym Sci 41(5):621–639
Article CAS Google Scholar
Wang S, Zhou T, Lu Y et al (2022a) Detail-preserving transformer for light field image super-resolution. In: Proceedings of the AAAI conference on artificial intelligence, pp 2522–2530
Wang X, Zhang J (2022) Lightweight multi-attention fusion network for image super-resolution. Frontiers in Computing and Intelligent Systems 2(1):13–19
Article Google Scholar
Wang X, Chao W, Duan F (2023) Depth optimization for accurate 3d reconstruction from light field images. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer, pp 79–90
Wang Y, Wang L, Yang J et al (2020) Spatial-angular interaction for light field image super-resolution. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, pp 290–308
Wang Y, Wang L, Liang Z et al (2022) Occlusion-aware cost constructor for light field depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19,809–19,818
Wang Y, Wang L, Wu G et al (2022) Disentangling light fields for super-resolution and disparity estimation. IEEE Trans Pattern Anal Mach Intell 45(1):425–443
Article PubMed Google Scholar
Wang Z, Lu Y (2022) Multi-granularity aggregation transformer for light field image super-resolution. In: 2022 IEEE International conference on image processing (ICIP). IEEE, pp 261–265
Wanner S, Meister S, Goldluecke B (2013) Datasets and benchmarks for densely sampled 4d light fields. In: VMV, pp 225–226
Wu G, Zhao M, Wang L et al (2017) Light field reconstruction using deep convolutional network on epi. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6319–6327
**e E, Wang W, Yu Z et al (2021) Segformer: simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34:12,077–12,090
**ng F, Wang D, Tan H et al (2024) High-resolution light-field particle imaging velocimetry with color-and-depth encoded illumination. Opt Lasers Eng 173:107,921
Article Google Scholar
Yeung HWF, Hou J, Chen X et al (2018) Light field spatial super-resolution using deep efficient spatial-angular separable convolution. IEEE Trans Image Process 28(5):2319–2330
Article MathSciNet ADS Google Scholar
Yoon Y, Jeon HG, Yoo D et al (2015) Learning a deep convolutional network for light-field image super-resolution. In: Proceedings of the IEEE international conference on computer vision workshops, pp 24–32
Yu L, Ma Y, Hong S et al (2022) Reivew of light field image super-resolution. Electronics 11(12):1904
Article Google Scholar
Yuan Y, Cao Z, Su L (2018) Light-field image superresolution using a combined deep cnn based on epi. IEEE Signal Process Lett 25(9):1359–1363
Article ADS Google Scholar
Zhang Q, Xu Y, Zhang J et al (2022) Vsa: learning varied-size window attention in vision transformers. In: European conference on computer vision. Springer, pp 466–483
Zhang S, Lin Y, Sheng H (2019) Residual networks for light field image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11,046–11,055
Zhang S, Chang S, Lin Y (2021) End-to-end light field spatial super-resolution network using multiple epipolar geometry. IEEE Trans Image Process 30:5956–5968
Article PubMed ADS Google Scholar
Zhang Y, Li K, Li K et al (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 286–301
Zhou P, Wang Y, Xu Y et al (2022) Phase-unwrap**-free 3d reconstruction in structured light field system based on varied auxiliary point. Optics Express 30(17):29,957–29,968
Zhu H, Guo M, Li H et al (2019) Revisiting spatio-angular trade-off in light field cameras and extended applications in super-resolution. IEEE Trans Visual Comput Graphics 27(6):3019–3033
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Shenzhen Fundamental Research fund under Grant 20200810150441003 and JCYJ20190808143415801, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2020A1515011559 and 2021A1515012287.

Author information

Authors and Affiliations

College of Mechatronics and Control Engineering, Shenzhen University, Nanhai Avenue, Shenzhen, 518060, Guangdong Province, China
**ngzheng Wang, Shaoyong Wu, Jiahui Li & Jianbin Wu

Authors

**ngzheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shaoyong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianbin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to **ngzheng Wang.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Wu, S., Li, J. et al. Lightweight network with masks for light field image super-resolution based on swin attention. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18588-z

Download citation

Received: 27 August 2023
Revised: 23 January 2024
Accepted: 02 February 2024
Published: 29 February 2024
DOI: https://doi.org/10.1007/s11042-024-18588-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Thailand)

Instant access to the full article PDF.

Institutional subscriptions

Lightweight network with masks for light field image super-resolution based on swin attention

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical feature fusion network for light field spatial super-resolution

Learning a Deep Fourier Attention Generative Adversarial Network for Light Field Image Super-Resolution

Light Field Image Super-Resolution via Global-View Information Adaption and Angular Attention Fusion

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Lightweight network with masks for light field image super-resolution based on swin attention

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical feature fusion network for light field spatial super-resolution

Learning a Deep Fourier Attention Generative Adversarial Network for Light Field Image Super-Resolution

Light Field Image Super-Resolution via Global-View Information Adaption and Angular Attention Fusion

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation