Abstract
Light field (LF) image super-resolution (SR) is a technique designed to enhance the details and clarity of low-resolution (LR) light field images by leveraging the additional information and structure present within the LF data. With the rise of deep learning, the performance of LF image super-resolution has been significantly improved, but it has led to an increase in model parameters and computational complexity, resulting in a phenomenon of excessive reliance on computational resources. To address this problem, this paper proposes a lightweight but effective model. In our approach, we employ Swin Attention to extract features from LF images. Window and Shifted Window are the main components of Swin Attention. Therefore, the local features of the LF images are captured using Window, and feature correlations are established through Shifted Window. Furthermore, we introduce Extensive Attention (EA) blocks to capture the global features of the LF image. In addition to the aforementioned configurations, we have also engineered an iteration of the Low Computation Convolution (LCC) that is capable of eliminating redundant information prior to the feature extraction process in LF images. This design aims to mitigate superfluous computations, thereby enhancing computational efficiency. Experimental results show that our approach achieves suboptimal performance compared to other state-of-the-art models, while having fewer parameters, lower computational complexity, and faster inference speed.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18588-z/MediaObjects/11042_2024_18588_Fig8_HTML.png)
Similar content being viewed by others
Data Availability Statement
The datasets generated and analyzed during the current study are not publicly available due to the excessive size but are available from the corresponding author on reasonable request.
References
Balzer W, Takahashi M, Ohta J et al (1991) Weight quantization in boltzmann machines. Neural Netw 4(3):405–409
Beal J, Kim E, Tzeng E et al (2020) Toward transformer-based object detection. ar**v:2012.09958
Bhojanapalli S, Chakrabarti A, Glasner D et al (2021) Understanding robustness of transformers for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10,231–10,241
Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Cao H, Wang Y, Chen J et al (2022) Swin-unet: unet-like pure transformer for medical image segmentation. In: European conference on computer vision. Springer, pp 205–218
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Chen CFR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 357–366
Chen H, Wang Y, Guo T et al (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12,299–12,310
Chen J, Zhang S, Lin Y (2021) Attention-based multi-level fusion network for light field depth estimation. In: Proceedings of the AAAI conference on artificial intelligence, pp 1009–1017
Cheng Z, **ong Z, Chen C et al (2021) Light field super-resolution with zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,010–10,019
Cong R, Sheng H, Yang D et al (2023) Exploiting spatial and angular correlations with deep efficient transformers for light field image super-resolution. IEEE Transactions on Multimedia
Devlin J, Chang MW, Lee K et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805
Ding Y, Chen Z, Ji Y et al (2023) Light field-based underwater 3d reconstruction via angular resampling. IEEE Transactions on Computational Imaging
Dong C, Loy CC, He K et al (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. ar**v:2010.11929
Fan H, Liu D, **ong Z et al (2017) Two-stage convolutional neural network for light field super-resolution. In: 2017 IEEE International conference on image processing (ICIP). IEEE, pp 1167–1171
Gao L, Zhang J, Yang C et al (2022) Cas-vswin transformer: a variant swin transformer for surface-defect detection. Comput Ind 140:103,689
Gehrig M, Scaramuzza D (2023) Recurrent vision transformers for object detection with event cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13,884–13,893
Guo X, Sang X, Chen D et al (2021) Real-time optical reconstruction for a three-dimensional light-field display based on path-tracing and cnn super-resolution. Optics Express 29(23):37,862–37,876
Han K, Wang Y, Chen H et al (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). ar**v:1606.08415
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. ar**v:1503.02531
Honauer K, Johannsen O, Kondermann D et al (2017) A dataset and evaluation methodology for depth estimation on 4d light fields. In: Computer vision–ACCV 2016: 13th Asian conference on computer vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part III 13. Springer, pp 19–34
Huang J, Fang Y, Wu Y et al (2022) Swin transformer for fast mri. Neurocomputing 493:281–304
** J, Hou J, Chen J et al (2020) Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2260–2269
Kim J, Lee JK, Lee KM (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980
Ko K, Koh YJ, Chang S et al (2021) Light field super-resolution via adaptive feature remixing. IEEE Trans Image Process 30:4114–4128
Le Pendu M, Jiang X, Guillemot C (2018) Light field inpainting propagation via low rank matrix completion. IEEE Trans Image Process 27(4):1981–1993
Liang J, Cao J, Sun G, et al (2021) Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844
Liang Z, Wang Y, Wang L et al (2022) Light field image super-resolution with transformers. IEEE Signal Process Lett 29:563–567
Liang Z, Wang Y, Wang L et al (2023) Learning non-local spatial-angular correlation for light field image super-resolution. ar**v:2302.08058
Liao W, Bai X, Zhang Q et al (2023) Decoupled and reparameterized compound attention-based light field depth estimation network. IEEE Access 11:130,119–130,130
Lim B, Son S, Kim H et al (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144
Liu G, Yue H, Wu J et al (2021) Intra-inter view interaction network for light field image super-resolution. IEEE Transactions on Multimedia
Liu R, Lehman J, Molino P et al (2018) An intriguing failing of convolutional neural networks and the coordconv solution. Adv Neural Inf Process Syst 31
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10,012–10,022
Liu Z, Ning J, Cao Y et al (2022) Video swin transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3202–3211
Ma D, Lumsdaine A, Zhou W (2020) Flexible spatial and angular light field super resolution. In: 2020 IEEE International conference on image processing (ICIP). IEEE, pp 2970–2974
Maas AL, Hannun AY, Ng AY et al (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, Atlanta, GA, p 3
McGriff H, Martins R, Andreff N et al (2024) Joint 3d shape and motion estimation from rolling shutter light-field images. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3699–3708
Meng N, Ge Z, Zeng T et al (2020) Lightgan: a deep generative model for light field reconstruction. IEEE Access 8:116,052–116,063
Qu Q, Chen X, Chung YY et al (2023) Lfacon: introducing anglewise attention to no-reference quality assessment in light field space. IEEE Trans Visual Comput Graphics 29(5):2239–2248
Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Rerabek M, Ebrahimi T (2016) New light field image dataset. In: 8th International conference on quality of multimedia experience (QoMEX), CONF
Sha Y, Zhang Y, Ji X et al (2021) Transformer-unet: raw image processing with unet. ar**v:2109.08417
Srinivas S, Babu RV (2015) Data-free parameter pruning for deep neural networks. ar**v:1507.06149
Strudel R, Garcia R, Laptev I et al (2021) Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7262–7272
Vaish V, Adams A (2008) The (new) stanford light field archive. Computer Graphics Laboratory, Stanford University 6(7):3
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang BH, Ma YG, Cao Y (2023) A brief introduction to organic electrodeposition and a review of the fabrication of oleds based on electrodeposition technology. Chin J Polym Sci 41(5):621–639
Wang S, Zhou T, Lu Y et al (2022a) Detail-preserving transformer for light field image super-resolution. In: Proceedings of the AAAI conference on artificial intelligence, pp 2522–2530
Wang X, Zhang J (2022) Lightweight multi-attention fusion network for image super-resolution. Frontiers in Computing and Intelligent Systems 2(1):13–19
Wang X, Chao W, Duan F (2023) Depth optimization for accurate 3d reconstruction from light field images. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer, pp 79–90
Wang Y, Wang L, Yang J et al (2020) Spatial-angular interaction for light field image super-resolution. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, pp 290–308
Wang Y, Wang L, Liang Z et al (2022) Occlusion-aware cost constructor for light field depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19,809–19,818
Wang Y, Wang L, Wu G et al (2022) Disentangling light fields for super-resolution and disparity estimation. IEEE Trans Pattern Anal Mach Intell 45(1):425–443
Wang Z, Lu Y (2022) Multi-granularity aggregation transformer for light field image super-resolution. In: 2022 IEEE International conference on image processing (ICIP). IEEE, pp 261–265
Wanner S, Meister S, Goldluecke B (2013) Datasets and benchmarks for densely sampled 4d light fields. In: VMV, pp 225–226
Wu G, Zhao M, Wang L et al (2017) Light field reconstruction using deep convolutional network on epi. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6319–6327
**e E, Wang W, Yu Z et al (2021) Segformer: simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34:12,077–12,090
**ng F, Wang D, Tan H et al (2024) High-resolution light-field particle imaging velocimetry with color-and-depth encoded illumination. Opt Lasers Eng 173:107,921
Yeung HWF, Hou J, Chen X et al (2018) Light field spatial super-resolution using deep efficient spatial-angular separable convolution. IEEE Trans Image Process 28(5):2319–2330
Yoon Y, Jeon HG, Yoo D et al (2015) Learning a deep convolutional network for light-field image super-resolution. In: Proceedings of the IEEE international conference on computer vision workshops, pp 24–32
Yu L, Ma Y, Hong S et al (2022) Reivew of light field image super-resolution. Electronics 11(12):1904
Yuan Y, Cao Z, Su L (2018) Light-field image superresolution using a combined deep cnn based on epi. IEEE Signal Process Lett 25(9):1359–1363
Zhang Q, Xu Y, Zhang J et al (2022) Vsa: learning varied-size window attention in vision transformers. In: European conference on computer vision. Springer, pp 466–483
Zhang S, Lin Y, Sheng H (2019) Residual networks for light field image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11,046–11,055
Zhang S, Chang S, Lin Y (2021) End-to-end light field spatial super-resolution network using multiple epipolar geometry. IEEE Trans Image Process 30:5956–5968
Zhang Y, Li K, Li K et al (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 286–301
Zhou P, Wang Y, Xu Y et al (2022) Phase-unwrap**-free 3d reconstruction in structured light field system based on varied auxiliary point. Optics Express 30(17):29,957–29,968
Zhu H, Guo M, Li H et al (2019) Revisiting spatio-angular trade-off in light field cameras and extended applications in super-resolution. IEEE Trans Visual Comput Graphics 27(6):3019–3033
Acknowledgements
This work was supported by the Shenzhen Fundamental Research fund under Grant 20200810150441003 and JCYJ20190808143415801, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2020A1515011559 and 2021A1515012287.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Wu, S., Li, J. et al. Lightweight network with masks for light field image super-resolution based on swin attention. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18588-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18588-z