Abstract
Transformers have achieved remarkable results in high-level vision tasks, but their application in low-level computer vision tasks such as image denoising remains largely unexplored. In this paper, we propose a novel channel attention residual enhanced Swin Transformer denoising network (CARSTDn), which is an efficient and effective Transformer-based architecture. CARSTDn consists of three modules: shallow feature extraction, deep feature extraction, and image reconstruction modules. The deep feature extraction module is the core of CARSTDn, and it employs a channel attention residual Swin Transformer block (CARSTB). Our benchmarking results demonstrate that CARSTDn outperforms existing state-of-the-art methods, showcasing its superiority. We hope that our work will inspire further research into the use of Transformer-based architectures for image denoising tasks.
Similar content being viewed by others
References
Agustsson E, Timofte R (2017) Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135
Ahmad S, Mehfuz S, Mebarek-Oudina F, Beg J (2022) Rsm analysis based cloud access security broker: a systematic literature review. Clust Comput 25(5):3733–3763
Aljadaany R, Pal DK, Savvides M (2019) Proximal splitting networks for image restoration. In: International Conference on Image Analysis and Recognition, pp. 3–17. Springer
Anwar S, Barnes N (2019) Real image denoising with feature attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3155–3164
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. ar**v preprint ar**v:2105.05537
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer
Chen Y, Pock T (2016) Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans Pattern Anal Mach Intell 39(6):1256–1272
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. ar**v preprint ar**v:2010.11929
Gu S, Zhang L, Zuo W, Feng X (2014) Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Huang JB, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141
Jia X, Liu S, Feng X, Zhang L (2019) Focnet: A fractional optimal control network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6054–6063
Lebrun M (2012) An analysis and implementation of the bm3d image denoising method. Image Processing On Line 2012:175–213
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. ar**v preprint ar**v:2108.10257
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: A survey. Int J Comput Vision 128(2):261–318
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. ar**v preprint ar**v:2103.14030
Liu Y, Sun G, Qiu Y, Zhang L, Chhatkuli A, Van Gool L (2021) Transformer in convolutional neural networks. ar**v preprint ar**v:2106.03180
Liu P, Zhang H, Zhang K, Lin L, Zuo W (2018) Multi-level wavelet-cnn for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782
Li Y, Zhang K, Cao J, Timofte R, Van Gool, L (2021) Localvit: Bringing locality to vision transformers. ar**v preprint ar**v:2104.05707
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. ar**v preprint ar**v:1711.05101
Nyo, M.T., Mebarek-Oudina F, Hlaing SS, Khan NA (2022) Otsu’s thresholding technique for mri image brain tumor segmentation. Multimed Tools Appl 1–13
Plötz T, Roth S (2018) Neural nearest neighbors networks. Adv Neural Inf Process Syst 31
Quan Y, Chen Y, Shao Y, Teng H, Xu Y, Ji H (2021) Image denoising using complex-valued deep cnn. Pattern Recogn 111:107639
Ramachandran P, Parmar N, Vaswani A, Bello I, Levskaya A, Shlens J (2019) Studying stand-alone self-attention in vision models
Roth S, Black MJ (2005) Fields of experts: A framework for learning image priors. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 860–867. IEEE
Shi Q, Tang X, Yang T, Liu R, Zhang L (2021) Hyperspectral image denoising using a 3-d attention denoising network. IEEE Transactions on Geoscience and Remote Sensing
Tian C, Xu Y, Fei L, Wang J, Wen J, Luo N (2019) Enhanced cnn for image denoising. CAAI Transactions on Intelligence Technology 4(1):17–23
Tian C, Xu Y, Zuo W (2020) Image denoising using deep cnn with batch renormalization. Neural Netw 121:461–473
Tian C, Xu Y, Li Z, Zuo W, Fei L, Liu H (2020) Attention-guided cnn for image denoising. Neural Netw 124:117–129
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR
Vaswani A, Ramachandran P, Srinivas A, Parmar N, Hechtman B, Shlens J (2021) Scaling local self-attention for parameter efficient visual backbones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12894–12904
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008
Wang Z, Cun X, Bao J, Liu J (2021) Uformer: A general u-shaped transformer for image restoration. ar**v preprint ar**v:2106.03106
Wu H, **ao B, Codella, N, Liu M, Dai X, Yuan L, Zhang, L (2021) Cvt: Introducing convolutions to vision transformers. ar**v preprint ar**v:2103.15808
Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, Gonzalez J, Keutzer K, Vajda P (2020) Visual transformers: Token-based image representation and processing for computer vision. ar**v preprint ar**v:2006.03677
**ao J, Zhao R, Lam K-M (2021) Bayesian sparse hierarchical model for image denoising. Signal Processing: Image Communication 96:116299
Xu J, Zhang L, Zuo W, Zhang D, Feng X (2015) Patch group based nonlocal self-similarity prior learning for image denoising. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 244–252
Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers. ar**v preprint ar**v:2103.11816
Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
Zhang K, Zuo W, Zhang L (2018) Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Trans Image Process 27(9):4608–4622
Zhang Y, Li K, Li K, Zhong B, Fu Y (2019) Residual non-local attention networks for image restoration. ar**v preprint ar**v:1903.10082
Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, Timofte R (2021) Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence
Zhang K, Zuo W, Gu S, Zhang L (2017) Learning deep cnn denoiser prior for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, **ang T, Torr PH, et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890
Zoran D, Weiss Y (2011) From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, pp. 479–486. IEEE
Acknowledgements
The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions.This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under [Grant No. 19KJA550002], by the Six Talent Peak Project of Jiangsu Province of China under [Grant No. XYDXX-054], by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dai, Q., Cheng, X. & Zhang, L. Image denoising using channel attention residual enhanced Swin Transformer. Multimed Tools Appl 83, 19041–19059 (2024). https://doi.org/10.1007/s11042-023-16209-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16209-9