Abstract
Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attention mechanism intrinsic to the Transformer architecture results in images being treated as one-dimensional sequences, thereby neglecting their inherent two-dimensional structure. Moreover, infrared images exhibit a uniform pixel distribution and a limited gradient range, posing challenges for the model to capture effective feature information. Consequently, we suggest a potent Transformer model, termed Large Kernel Transformer (LKFormer), to address this issue. Specifically, we have designed a Large Kernel Residual Depth-wise Convolutional Attention (LKRDA) module with linear complexity. This mainly employs depth-wise convolution with large kernels to execute non-local feature modeling, thereby substituting the standard self-attention layer. Additionally, we have devised a novel feed-forward network structure called Gated-Pixel Feed-Forward Network (GPFN) to augment the LKFormer’s capacity to manage the information flow within the network. Comprehensive experimental results reveal that our method surpasses the most advanced techniques available, using fewer parameters and yielding considerably superior performance.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18409-3/MediaObjects/11042_2024_18409_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18409-3/MediaObjects/11042_2024_18409_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18409-3/MediaObjects/11042_2024_18409_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18409-3/MediaObjects/11042_2024_18409_Fig4_HTML.png)
Similar content being viewed by others
Data availability statement
The authors confirm that the data supporting the findings of this study are available in a public repository. These data were derived from the following resources available in the public domain (https://figshare.com/s/2121562561211c0a8101, https://github.com/rafariva/ThermalDatasets).
References
Sousa E, Vardasca R, Teixeira S, Seixas A, Mendes J, Costa-Ferreira A (2017) A review on the application of medical infrared thermal imaging in hands. Infrared Phys & Technol 85:315–323
Lopez-Perez D, Antonino-Daviu J (2017) Application of infrared thermography to failure detection in industrial induction motors: case stories. IEEE Trans Ind Appl 53(3):1901–1908
Kirimtat A, Krejcar O (2018) A review of infrared thermography for the investigation of building envelopes: Advances and prospects. Energy and Buildings 176:390–406
Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144
Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Change Loy C (2018) ESRGAN: Enhanced super-resolution generative adversarial networks. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 701–710
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 286–301
Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, Timofte R (2021) Plug-and-play image restoration with deep denoiser prior. IEEE Trans Pattern Anal Mach Intell 44(10):6360–6376
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) SwinIR: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844
Dong C, Loy CC, He K, Tang X (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Kim J, Lee JK, Lee KM (2016) Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1637–1645
Zhang K, Zuo W, Gu S, Zhang L (2017) Learning deep CNN denoiser prior for image restoration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3929–3938
Kim J, Lee JK, Lee KM (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654
Cavigelli L, Hager P, Benini L (2017) CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In: 2017 International joint conference on neural networks (IJCNN), pp 752–759
Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual dense network for image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2472–2481
Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2020) Residual dense network for image restoration. IEEE Trans Pattern Anal Mach Intell 43(7):2480–2495
Dai T, Cai J, Zhang Y, **a S-T, Zhang L (2019) Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11065–11074
Niu B, Wen W, Ren W, Zhang X, Yang L, Wang S, Zhang K, Cao X, Shen H (2020) Single image super-resolution via a holistic attention network. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, Springer, pp 191–207
Zhao H, Kong X, He J, Qiao Y, Dong C (2020) Efficient image super-resolution using pixel attention. In: Computer vision–ECCV 2020 workshops: Glasgow, UK, Proceedings, Part III 16, Springer, pp 56–72. Accessed 23–28 Aug 2020
Mei Y, Fan Y, Zhou Y (2021) Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3517–3526
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. ar**v preprint ar**v:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Fang J, Lin H, Chen X, Zeng K (2022) A hybrid network of CNN and Transformer for lightweight image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1103–1112
Chen X, Wang X, Zhou J, Qiao Y, Dong C (2023) Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 22367–22377
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2022) Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5728–5739
Si T, He F, Li P, Gao X (2023) Tri-modality consistency optimization with heterogeneous augmented images for visible-infrared person re-identification. Neurocomputing 523:170–181
Tang W, He F, Liu Y (2023) Tccfusion: An infrared and visible image fusion method based on transformer and cross correlation. Pattern Recogn 137:109295
Wang J, Ralph JF, Goulermas JY (2009) An analysis of a robust super resolution algorithm for infrared imaging. In: 2009 Proceedings of 6th international symposium on image and signal processing and analysis, pp 158–163
He Z, Tang S, Yang J, Cao Y, Yang MY, Cao Y (2018) Cascaded deep networks with multiple receptive fields for infrared image super-resolution. IEEE Trans Circuits Syst Video Technol 29(8):2310–2322
Zou Y, Zhang L, Liu C, Wang B, Hu Y, Chen Q (2021) Super-resolution reconstruction of infrared images based on a convolutional neural network with skip connections. Opt Lasers Eng 146:106717
Huang Y, Jiang Z, Lan R, Zhang S, Pi K (2021) Infrared image super-resolution via transfer learning and PSRGAN. IEEE Signal Process Lett 28:982–986
Huang Y, Jiang Z, Wang Q, Jiang Q, Pang G (2021) Infrared image super-resolution via Heterogeneous Convolutional WGAN. In: Pacific rim international conference on artificial intelligence, pp 461–472
Wu W, Wang T, Wang Z, Cheng L, Wu H (2022) Meta transfer learning-based super-resolution infrared imaging. Digital Signal Processing 131:103730
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361
Asher T, Zico KJ (2022) Patches are all you need? In: Proceedings of the IEEE international conference on learning representations (ICLR)
Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J et al (2021) Mlp-mixer: An all-mlp architecture for vision. Adv Neural Inf Process Syst 34:24261–24272
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, **e S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ding X, Zhang X, Han J, Ding G (2022) Scaling up your kernels to 31x31: Revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11963–11975
Liu S, Chen T, Chen X, Chen X, **ao Q, Wu B, Pechenizkiy M, Mocanu D, Wang Z (2022) More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. ar**v preprint ar**v:2207.03620
Zou Y, Zhang L, Liu C, Wang B, Hu Y, Chen Q (2021) Super-resolution reconstruction of infrared images based on a convolutional neural network with skip connections. Opt Lasers Eng 146:106717
Liu Y, Chen X, Cheng J, Peng H, Wang Z (2018) Infrared and visible image fusion with convolutional neural networks. Int J Wavelets Multiresolut Inf Process 16(03):1850018
Danaci KI, Akagunduz E (2022) A survey on infrared image and video sets. ar**v preprint ar**v:2203.08581
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Gu J, Dong C (2021) Interpreting super-resolution networks with local attribution maps. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9199–9208
Funding
This work was supported by National Key Research and Development Program of China (No. 2023YFE0114900), Aeronautical Science Foundation of China (No. 2022Z0710T5001), GuangDong Basic and Applied Basic Research Foundation (No. 2022A1515110570), Innovation teams of youth innovation in science and technology of high education institutions of Shandong province (No. 2021KJ088), the Open Project Program of the State Key Laboratory of CAD &CG (No. A2304), Zhejiang University. The authors would like to thank the reviewers in advance for their comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Source code
The source code will be available at https://github.com/sad192/large-kernel-Transformer
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qin, F., Yan, K., Wang, C. et al. LKFormer: large kernel transformer for infrared image super-resolution. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18409-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18409-3