PEFormer: a pixel-level enhanced CNN-transformer hybrid network for face image super-resolution

Lu, **nbiao; Gao, **ng; Chen, Yisen; Chen, Guiyun; Yang, Tieliu; Chen, Yudan

doi:10.1007/s11760-024-03395-8

PEFormer: a pixel-level enhanced CNN-transformer hybrid network for face image super-resolution

Original Paper
Published: 02 July 2024

(2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

**nbiao Lu¹,
**ng Gao¹,
Yisen Chen¹,
Guiyun Chen¹,
Tieliu Yang¹ &
…
Yudan Chen¹

Abstract

In recent years, the rapid development of deep learning technologies has advanced the field of Face Super-Resolution (FSR), particularly with the introduction of numerous methods based on CNNs and Transformers. However, employing these architectures individually often overlooks the relationship between global and local information, resulting in a lack of detail and clarity in FSR images. To address this issue, we propose a novel pixel-enhanced CNN-Transformer hybrid network for FSR, called PEFormer. For the CNN branch, we design a DetailCNN that utilizes Central Cross Differential Convolution (CCDC) to extract gradient features from adjacent local regions of the face, and subsequent feature enhancement yields details-rich local features. For the Transformer branch, we design a Pixel-level Triple-path Transformer (PTT), which includes a global feature-extracting Transformer, a Pixel-Feature Attention Block (PFAB) that supplements high-frequency features and pixel information, and residual connections. These three combined paths may capture pixel-level long-range visual dependencies. The cascading of DetailCNN and PTT enables frequent interaction between local and global information, improving the extraction of deep features. Additionally, we employ a progressive image reconstruction structure according to different upscaling factors, reinforcing multi-scale feature information for high-quality reconstructed images. Moreover, considering the strong identity attributes of face images, an identity loss function is incorporated during the training to refine the details of facial components, bringing them closer to the target image and more aligned with human perception. Experiment results indicate that our proposed method effectively enhances facial feature details and improves FSR performance, showing improvements in both qualitative and quantitative metrics compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Fig. 13

Data availability

CelebA: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.htmlFFHQ: https://github.com/NVlabs/ffhq-datasetHelen: http://www.ifp.illinois.edu/~vuongle2/helen/.

References

Jiang, J., Wang, C., Liu, X., et al.: Deep learning-based face super-resolution: A survey. ACM Comput. Surv. (CSUR). 55(1), 1–36 (2021). https://doi.org/10.1145/3485132
Article Google Scholar
Ge, S., Zhao, S., Li, C., et al.: Low-resolution face recognition in the wild via selective knowledge distillation. IEEE Trans. Image Process. 28(4), 2051–2062 (2019). https://doi.org/10.1109/TIP.2018.2883743
Article MathSciNet Google Scholar
Zhang, K., Zheng, D., Li, J., et al.: Coupled discriminative manifold alignment for low-resolution face recognition. Pattern Recogn. 147, 110049 (2024). https://doi.org/10.1016/j.patcog.2023.110049
Article Google Scholar
Chen, C., Gong, D., Wang, H., et al.: Learning spatial attention for face super-resolution. IEEE Trans. Image Process. 30, 1219–1231 (2021). https://doi.org/10.1109/TIP.2020.3043093
Article Google Scholar
Liu, H., Han, Z., Guo, J., et al.: A Noise robust face hallucination framework via cascaded model of deep convolutional networks and manifold learning. In IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018). https://doi.org/10.1109/ICME.2018.8486563
Jiang, K., Wang, Z., Yi, P., et al.: Dual-path deep fusion network for image hallucination. IEEE Trans. Neural Networks Learn. Syst. 33(1), 378–391 (2022). https://doi.org/10.1109/TNNLS.2020.3027849
Article Google Scholar
Chen, Y., Tai, Y., Liu, et al.: FSRNet: End-to-end learning face super-resolution with facial priors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2492–2501 (2018). https://doi.org/10.1109/CVPR.2018.00264
Kalarot, R., Li, T., Porikli, F.: Component attention guided face super-resolution network: CAGface. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 370–380, (2020). https://doi.org/10.1109/WACV45572.2020.9093399
Ma, C., Jiang, Z., Rao, Y., et al.: Deep face super-resolution with iterative collaboration between attentive recovery and landmark estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5569–5578, (2020). https://doi.org/10.1109/CVPR42600.2020.00561
Liang, J., Cao, J., Sun, G., et al.: SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844, (2021). https://doi.org/10.1109/ICCVW54120.2021.00210
Wang, Z., Cun, X., Bao, J., et al.: UFormer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17683–17693, (2022). https://doi.org/10.1109/CVPR52688.2022.01716
Zhang, X., Zeng, H., Guo, S., et al.: Efficient long-range attention network for image super-resolution. In European conference on computer vision, pp. 649–667 (2022). https://doi.org/10.1007/978-3-031-19790-1_39
Wang, Y., Lu, T., Zhang, Y., et al.: TAnet: A new paradigm for global face super-resolution via transformer-CNN aggregation network. ar**v Preprint ar**v. (2021). https://doi.org/10.48550/ar**v.2109.08174 :2109.08174
Article Google Scholar
Bao, Q., Liu, Y., Gang, B., et al.: SCTANet: A spatial attention-guided CNN-transformer aggregation network for deep face image super-resolution. IEEE Trans. Multimedia. 25, 8554–8565 (2023). https://doi.org/10.1109/TMM.2023.3238522
Article Google Scholar
Gao, G., Xu, Z., Li, J., et al.: CTCNet: A CNN-transformer cooperation network for face image super-resolution. IEEE Trans. Image Process. 32, 1978–1991 (2023). https://doi.org/10.1109/TIP.2023.3261747
Article Google Scholar
Duan, H., Long, Y., Wang, S., et al.: Dynamic unary convolution in transformers. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Duan, H., Wan, F., Sun, R., et al.: Wearable-based behaviour interpolation for semi-supervised human activity recognition. Inf. Sci. 120393 (2024). https://doi.org/10.1016/j.ins.2024.120393
Baker, S., Kanade, T.: Hallucinating faces. In Proceedings Fourth IEEE international conference on automatic face and gesture recognition, pp. 83–88 (2000). https://doi.org/10.1109/AFGR.2000.840616
Huang, H., He, H., Fan, X., et al.: Super-resolution of human face image using canonical correlation analysis. Pattern Recogn. 43(7), 2532–2543 (2010). https://doi.org/10.1016/j.patcog.2010.02.007
Article Google Scholar
Shi, J., Liu, X., Qi, C.: Global consistency, local sparsity and pixel correlation: A unified framework for face hallucination. Pattern Recogn. 47(11), 3520–3534 (2014). https://doi.org/10.1016/j.patcog.2014.04.023
Article Google Scholar
Chen, L., Pan, J., Jiang, J., et al.: Robust face super-resolution via position relation model based on global face context. IEEE Trans. Image Process. 29, 9002–9016 (2020). https://doi.org/10.1109/TIP.2020.3023580
Article MathSciNet Google Scholar
Jiang, J., Chen, C., Ma, J., et al.: SRLSP: A face image super-resolution algorithm using smooth regression with local structure prior. IEEE Trans. Multimedia. 19(1), 27–40 (2016). https://doi.org/10.1109/TMM.2016.2601020
Article Google Scholar
Dong, C., Loy, C.C., He, K., et al.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). https://doi.org/10.1109/TPAMI.2015.2439281
Article Google Scholar
Zhang, Y., Tian, Y., Kong, Y., et al.: Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2472–2481 (2018). https://doi.org/10.1109/TPAMI.2020.2968521
Han, W., Chang, S., Liu, D., et al.: Image super-resolution via dual-state recurrent networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1654–1663 (2018). https://doi.org/10.1109/CVPR.2018.00178
Zheng, H., Wang, X., Gao, X.: Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–731 (2018). https://doi.org/10.1109/CVPR.2018.00082
Liu, J., Tang, J., Wu, G.: Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision, pp. 41–55 (2020). https://doi.org/10.1007/978-3-030-67070-2_2
Yang, L., Liu, C., Wang, P., et al.: HiFaceGAN: Face renovation via collaborative suppression and replenishment. In Proceedings of the ACM International Conference on Multimedia, pp. 1551–1560 (2020). https://doi.org/10.1145/3394171.3413965
Zhang, M., Ling, Q.: Supervised pixel-wise GAN for face super-resolution. IEEE Trans. Multimedia. 23, 1938–1950 (2021). https://doi.org/10.1109/TMM.2020.3006414
Article Google Scholar
Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of 12th international conference on pattern recognition, 1, pp. 582–585 (1994). https://doi.org/10.1109/ICPR.1994.576366
Shi, W., Caballero, J., Huszár, F., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883 (2016)
Liu, Z., Luo, P., Wang, X., et al.: Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp. 3730–3738 (2015)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401–4410 (2019)
Le, V., Brandt, J., Lin, Z., et al.: Interactive facial feature localization. In 12th European Conference on Computer Vision, Part III 12, pp. 679–692 (2012). https://doi.org/10.1007/978-3-642-33712-3_49
Wang, Z., Bovik, A.C.: A universal image quality index. IEEE. Signal. Process. Lett. 9(3), 81–84 (2002). https://doi.org/10.1109/97.995823
Article Google Scholar
Zhang, R., Isola, P., Efros, et al.: The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595 (2018)
Dong, C., Loy, C.C., He, K., et al.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015). https://doi.org/10.1109/TPAMI.2015.2439281
Article Google Scholar
Lim, B., Son, S., Kim, H., et al.: Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144 (2017)
Lu, T., Wang, Y., Zhang, Y., et al.: Face hallucination via split-attention in split-attention network. In Proceedings of the 29th ACM international conference on multimedia, pp. 5501–5509 (2021). https://doi.org/10.1145/3474085.3475682
Wang, Q., Wu, B., Zhu, P., et al.: ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542 (2020)
Woo, S., Park, J., Lee, J., et al.: CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146–3154 (2019)

Download references

Author information

Authors and Affiliations

College of Artificial Intelligence and Automation, Hohai University, Nan**g, 211100, P.R. China
**nbiao Lu, **ng Gao, Yisen Chen, Guiyun Chen, Tieliu Yang & Yudan Chen

Authors

**nbiao Lu
View author publications
You can also search for this author in PubMed Google Scholar
**ng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yisen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guiyun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tieliu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yudan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XL contributed to conceptualization and methodology. XG was involved in methodology and manuscript writing and provided software. YC contributed to data interpretation and manuscript editing. GC was involved in validation and manuscript editing. TY and YC contributed to supervision and manuscript editing.

Corresponding author

Correspondence to **ng Gao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, X., Gao, X., Chen, Y. et al. PEFormer: a pixel-level enhanced CNN-transformer hybrid network for face image super-resolution. SIViP (2024). https://doi.org/10.1007/s11760-024-03395-8

Download citation

Received: 09 May 2024
Revised: 17 June 2024
Accepted: 19 June 2024
Published: 02 July 2024
DOI: https://doi.org/10.1007/s11760-024-03395-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PEFormer: a pixel-level enhanced CNN-transformer hybrid network for face image super-resolution

Abstract

Access this article

Subscribe and save

Buy Now

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation