Log in

Lightweight network with masks for light field image super-resolution based on swin attention

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Light field (LF) image super-resolution (SR) is a technique designed to enhance the details and clarity of low-resolution (LR) light field images by leveraging the additional information and structure present within the LF data. With the rise of deep learning, the performance of LF image super-resolution has been significantly improved, but it has led to an increase in model parameters and computational complexity, resulting in a phenomenon of excessive reliance on computational resources. To address this problem, this paper proposes a lightweight but effective model. In our approach, we employ Swin Attention to extract features from LF images. Window and Shifted Window are the main components of Swin Attention. Therefore, the local features of the LF images are captured using Window, and feature correlations are established through Shifted Window. Furthermore, we introduce Extensive Attention (EA) blocks to capture the global features of the LF image. In addition to the aforementioned configurations, we have also engineered an iteration of the Low Computation Convolution (LCC) that is capable of eliminating redundant information prior to the feature extraction process in LF images. This design aims to mitigate superfluous computations, thereby enhancing computational efficiency. Experimental results show that our approach achieves suboptimal performance compared to other state-of-the-art models, while having fewer parameters, lower computational complexity, and faster inference speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Thailand)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly available due to the excessive size but are available from the corresponding author on reasonable request.

References

  1. Balzer W, Takahashi M, Ohta J et al (1991) Weight quantization in boltzmann machines. Neural Netw 4(3):405–409

    Article  Google Scholar 

  2. Beal J, Kim E, Tzeng E et al (2020) Toward transformer-based object detection. ar**v:2012.09958

  3. Bhojanapalli S, Chakrabarti A, Glasner D et al (2021) Understanding robustness of transformers for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10,231–10,241

  4. Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901

    Google Scholar 

  5. Cao H, Wang Y, Chen J et al (2022) Swin-unet: unet-like pure transformer for medical image segmentation. In: European conference on computer vision. Springer, pp 205–218

  6. Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229

  7. Chen CFR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 357–366

  8. Chen H, Wang Y, Guo T et al (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12,299–12,310

  9. Chen J, Zhang S, Lin Y (2021) Attention-based multi-level fusion network for light field depth estimation. In: Proceedings of the AAAI conference on artificial intelligence, pp 1009–1017

  10. Cheng Z, **ong Z, Chen C et al (2021) Light field super-resolution with zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,010–10,019

  11. Cong R, Sheng H, Yang D et al (2023) Exploiting spatial and angular correlations with deep efficient transformers for light field image super-resolution. IEEE Transactions on Multimedia

  12. Devlin J, Chang MW, Lee K et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805

  13. Ding Y, Chen Z, Ji Y et al (2023) Light field-based underwater 3d reconstruction via angular resampling. IEEE Transactions on Computational Imaging

  14. Dong C, Loy CC, He K et al (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307

    Article  Google Scholar 

  15. Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. ar**v:2010.11929

  16. Fan H, Liu D, **ong Z et al (2017) Two-stage convolutional neural network for light field super-resolution. In: 2017 IEEE International conference on image processing (ICIP). IEEE, pp 1167–1171

  17. Gao L, Zhang J, Yang C et al (2022) Cas-vswin transformer: a variant swin transformer for surface-defect detection. Comput Ind 140:103,689

    Article  Google Scholar 

  18. Gehrig M, Scaramuzza D (2023) Recurrent vision transformers for object detection with event cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13,884–13,893

  19. Guo X, Sang X, Chen D et al (2021) Real-time optical reconstruction for a three-dimensional light-field display based on path-tracing and cnn super-resolution. Optics Express 29(23):37,862–37,876

  20. Han K, Wang Y, Chen H et al (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110

    Article  PubMed  Google Scholar 

  21. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  22. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). ar**v:1606.08415

  23. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. ar**v:1503.02531

  24. Honauer K, Johannsen O, Kondermann D et al (2017) A dataset and evaluation methodology for depth estimation on 4d light fields. In: Computer vision–ACCV 2016: 13th Asian conference on computer vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part III 13. Springer, pp 19–34

  25. Huang J, Fang Y, Wu Y et al (2022) Swin transformer for fast mri. Neurocomputing 493:281–304

    Article  Google Scholar 

  26. ** J, Hou J, Chen J et al (2020) Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2260–2269

  27. Kim J, Lee JK, Lee KM (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654

  28. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980

  29. Ko K, Koh YJ, Chang S et al (2021) Light field super-resolution via adaptive feature remixing. IEEE Trans Image Process 30:4114–4128

    Article  PubMed  ADS  Google Scholar 

  30. Le Pendu M, Jiang X, Guillemot C (2018) Light field inpainting propagation via low rank matrix completion. IEEE Trans Image Process 27(4):1981–1993

    Article  MathSciNet  ADS  Google Scholar 

  31. Liang J, Cao J, Sun G, et al (2021) Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844

  32. Liang Z, Wang Y, Wang L et al (2022) Light field image super-resolution with transformers. IEEE Signal Process Lett 29:563–567

    Article  ADS  Google Scholar 

  33. Liang Z, Wang Y, Wang L et al (2023) Learning non-local spatial-angular correlation for light field image super-resolution. ar**v:2302.08058

  34. Liao W, Bai X, Zhang Q et al (2023) Decoupled and reparameterized compound attention-based light field depth estimation network. IEEE Access 11:130,119–130,130

  35. Lim B, Son S, Kim H et al (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144

  36. Liu G, Yue H, Wu J et al (2021) Intra-inter view interaction network for light field image super-resolution. IEEE Transactions on Multimedia

  37. Liu R, Lehman J, Molino P et al (2018) An intriguing failing of convolutional neural networks and the coordconv solution. Adv Neural Inf Process Syst 31

  38. Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10,012–10,022

  39. Liu Z, Ning J, Cao Y et al (2022) Video swin transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3202–3211

  40. Ma D, Lumsdaine A, Zhou W (2020) Flexible spatial and angular light field super resolution. In: 2020 IEEE International conference on image processing (ICIP). IEEE, pp 2970–2974

  41. Maas AL, Hannun AY, Ng AY et al (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, Atlanta, GA, p 3

  42. McGriff H, Martins R, Andreff N et al (2024) Joint 3d shape and motion estimation from rolling shutter light-field images. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3699–3708

  43. Meng N, Ge Z, Zeng T et al (2020) Lightgan: a deep generative model for light field reconstruction. IEEE Access 8:116,052–116,063

  44. Qu Q, Chen X, Chung YY et al (2023) Lfacon: introducing anglewise attention to no-reference quality assessment in light field space. IEEE Trans Visual Comput Graphics 29(5):2239–2248

    Article  Google Scholar 

  45. Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28

  46. Rerabek M, Ebrahimi T (2016) New light field image dataset. In: 8th International conference on quality of multimedia experience (QoMEX), CONF

  47. Sha Y, Zhang Y, Ji X et al (2021) Transformer-unet: raw image processing with unet. ar**v:2109.08417

  48. Srinivas S, Babu RV (2015) Data-free parameter pruning for deep neural networks. ar**v:1507.06149

  49. Strudel R, Garcia R, Laptev I et al (2021) Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7262–7272

  50. Vaish V, Adams A (2008) The (new) stanford light field archive. Computer Graphics Laboratory, Stanford University 6(7):3

    Google Scholar 

  51. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  52. Wang BH, Ma YG, Cao Y (2023) A brief introduction to organic electrodeposition and a review of the fabrication of oleds based on electrodeposition technology. Chin J Polym Sci 41(5):621–639

    Article  CAS  Google Scholar 

  53. Wang S, Zhou T, Lu Y et al (2022a) Detail-preserving transformer for light field image super-resolution. In: Proceedings of the AAAI conference on artificial intelligence, pp 2522–2530

  54. Wang X, Zhang J (2022) Lightweight multi-attention fusion network for image super-resolution. Frontiers in Computing and Intelligent Systems 2(1):13–19

    Article  Google Scholar 

  55. Wang X, Chao W, Duan F (2023) Depth optimization for accurate 3d reconstruction from light field images. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer, pp 79–90

  56. Wang Y, Wang L, Yang J et al (2020) Spatial-angular interaction for light field image super-resolution. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, pp 290–308

  57. Wang Y, Wang L, Liang Z et al (2022) Occlusion-aware cost constructor for light field depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19,809–19,818

  58. Wang Y, Wang L, Wu G et al (2022) Disentangling light fields for super-resolution and disparity estimation. IEEE Trans Pattern Anal Mach Intell 45(1):425–443

    Article  PubMed  Google Scholar 

  59. Wang Z, Lu Y (2022) Multi-granularity aggregation transformer for light field image super-resolution. In: 2022 IEEE International conference on image processing (ICIP). IEEE, pp 261–265

  60. Wanner S, Meister S, Goldluecke B (2013) Datasets and benchmarks for densely sampled 4d light fields. In: VMV, pp 225–226

  61. Wu G, Zhao M, Wang L et al (2017) Light field reconstruction using deep convolutional network on epi. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6319–6327

  62. **e E, Wang W, Yu Z et al (2021) Segformer: simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34:12,077–12,090

  63. **ng F, Wang D, Tan H et al (2024) High-resolution light-field particle imaging velocimetry with color-and-depth encoded illumination. Opt Lasers Eng 173:107,921

    Article  Google Scholar 

  64. Yeung HWF, Hou J, Chen X et al (2018) Light field spatial super-resolution using deep efficient spatial-angular separable convolution. IEEE Trans Image Process 28(5):2319–2330

    Article  MathSciNet  ADS  Google Scholar 

  65. Yoon Y, Jeon HG, Yoo D et al (2015) Learning a deep convolutional network for light-field image super-resolution. In: Proceedings of the IEEE international conference on computer vision workshops, pp 24–32

  66. Yu L, Ma Y, Hong S et al (2022) Reivew of light field image super-resolution. Electronics 11(12):1904

    Article  Google Scholar 

  67. Yuan Y, Cao Z, Su L (2018) Light-field image superresolution using a combined deep cnn based on epi. IEEE Signal Process Lett 25(9):1359–1363

    Article  ADS  Google Scholar 

  68. Zhang Q, Xu Y, Zhang J et al (2022) Vsa: learning varied-size window attention in vision transformers. In: European conference on computer vision. Springer, pp 466–483

  69. Zhang S, Lin Y, Sheng H (2019) Residual networks for light field image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11,046–11,055

  70. Zhang S, Chang S, Lin Y (2021) End-to-end light field spatial super-resolution network using multiple epipolar geometry. IEEE Trans Image Process 30:5956–5968

    Article  PubMed  ADS  Google Scholar 

  71. Zhang Y, Li K, Li K et al (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 286–301

  72. Zhou P, Wang Y, Xu Y et al (2022) Phase-unwrap**-free 3d reconstruction in structured light field system based on varied auxiliary point. Optics Express 30(17):29,957–29,968

  73. Zhu H, Guo M, Li H et al (2019) Revisiting spatio-angular trade-off in light field cameras and extended applications in super-resolution. IEEE Trans Visual Comput Graphics 27(6):3019–3033

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Shenzhen Fundamental Research fund under Grant 20200810150441003 and JCYJ20190808143415801, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2020A1515011559 and 2021A1515012287.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to **ngzheng Wang.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wu, S., Li, J. et al. Lightweight network with masks for light field image super-resolution based on swin attention. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18588-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18588-z

Keywords

Navigation