Log in

PEFormer: a pixel-level enhanced CNN-transformer hybrid network for face image super-resolution

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In recent years, the rapid development of deep learning technologies has advanced the field of Face Super-Resolution (FSR), particularly with the introduction of numerous methods based on CNNs and Transformers. However, employing these architectures individually often overlooks the relationship between global and local information, resulting in a lack of detail and clarity in FSR images. To address this issue, we propose a novel pixel-enhanced CNN-Transformer hybrid network for FSR, called PEFormer. For the CNN branch, we design a DetailCNN that utilizes Central Cross Differential Convolution (CCDC) to extract gradient features from adjacent local regions of the face, and subsequent feature enhancement yields details-rich local features. For the Transformer branch, we design a Pixel-level Triple-path Transformer (PTT), which includes a global feature-extracting Transformer, a Pixel-Feature Attention Block (PFAB) that supplements high-frequency features and pixel information, and residual connections. These three combined paths may capture pixel-level long-range visual dependencies. The cascading of DetailCNN and PTT enables frequent interaction between local and global information, improving the extraction of deep features. Additionally, we employ a progressive image reconstruction structure according to different upscaling factors, reinforcing multi-scale feature information for high-quality reconstructed images. Moreover, considering the strong identity attributes of face images, an identity loss function is incorporated during the training to refine the details of facial components, bringing them closer to the target image and more aligned with human perception. Experiment results indicate that our proposed method effectively enhances facial feature details and improves FSR performance, showing improvements in both qualitative and quantitative metrics compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Data availability

CelebA: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.htmlFFHQ: https://github.com/NVlabs/ffhq-datasetHelen: http://www.ifp.illinois.edu/~vuongle2/helen/.

References

  1. Jiang, J., Wang, C., Liu, X., et al.: Deep learning-based face super-resolution: A survey. ACM Comput. Surv. (CSUR). 55(1), 1–36 (2021). https://doi.org/10.1145/3485132

    Article  Google Scholar 

  2. Ge, S., Zhao, S., Li, C., et al.: Low-resolution face recognition in the wild via selective knowledge distillation. IEEE Trans. Image Process. 28(4), 2051–2062 (2019). https://doi.org/10.1109/TIP.2018.2883743

    Article  MathSciNet  Google Scholar 

  3. Zhang, K., Zheng, D., Li, J., et al.: Coupled discriminative manifold alignment for low-resolution face recognition. Pattern Recogn. 147, 110049 (2024). https://doi.org/10.1016/j.patcog.2023.110049

    Article  Google Scholar 

  4. Chen, C., Gong, D., Wang, H., et al.: Learning spatial attention for face super-resolution. IEEE Trans. Image Process. 30, 1219–1231 (2021). https://doi.org/10.1109/TIP.2020.3043093

    Article  Google Scholar 

  5. Liu, H., Han, Z., Guo, J., et al.: A Noise robust face hallucination framework via cascaded model of deep convolutional networks and manifold learning. In IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018). https://doi.org/10.1109/ICME.2018.8486563

  6. Jiang, K., Wang, Z., Yi, P., et al.: Dual-path deep fusion network for image hallucination. IEEE Trans. Neural Networks Learn. Syst. 33(1), 378–391 (2022). https://doi.org/10.1109/TNNLS.2020.3027849

    Article  Google Scholar 

  7. Chen, Y., Tai, Y., Liu, et al.: FSRNet: End-to-end learning face super-resolution with facial priors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2492–2501 (2018). https://doi.org/10.1109/CVPR.2018.00264

  8. Kalarot, R., Li, T., Porikli, F.: Component attention guided face super-resolution network: CAGface. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 370–380, (2020). https://doi.org/10.1109/WACV45572.2020.9093399

  9. Ma, C., Jiang, Z., Rao, Y., et al.: Deep face super-resolution with iterative collaboration between attentive recovery and landmark estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5569–5578, (2020). https://doi.org/10.1109/CVPR42600.2020.00561

  10. Liang, J., Cao, J., Sun, G., et al.: SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844, (2021). https://doi.org/10.1109/ICCVW54120.2021.00210

  11. Wang, Z., Cun, X., Bao, J., et al.: UFormer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17683–17693, (2022). https://doi.org/10.1109/CVPR52688.2022.01716

  12. Zhang, X., Zeng, H., Guo, S., et al.: Efficient long-range attention network for image super-resolution. In European conference on computer vision, pp. 649–667 (2022). https://doi.org/10.1007/978-3-031-19790-1_39

  13. Wang, Y., Lu, T., Zhang, Y., et al.: TAnet: A new paradigm for global face super-resolution via transformer-CNN aggregation network. ar**v Preprint ar**v. (2021). https://doi.org/10.48550/ar**v.2109.08174 :2109.08174

    Article  Google Scholar 

  14. Bao, Q., Liu, Y., Gang, B., et al.: SCTANet: A spatial attention-guided CNN-transformer aggregation network for deep face image super-resolution. IEEE Trans. Multimedia. 25, 8554–8565 (2023). https://doi.org/10.1109/TMM.2023.3238522

    Article  Google Scholar 

  15. Gao, G., Xu, Z., Li, J., et al.: CTCNet: A CNN-transformer cooperation network for face image super-resolution. IEEE Trans. Image Process. 32, 1978–1991 (2023). https://doi.org/10.1109/TIP.2023.3261747

    Article  Google Scholar 

  16. Duan, H., Long, Y., Wang, S., et al.: Dynamic unary convolution in transformers. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

  17. Duan, H., Wan, F., Sun, R., et al.: Wearable-based behaviour interpolation for semi-supervised human activity recognition. Inf. Sci. 120393 (2024). https://doi.org/10.1016/j.ins.2024.120393

  18. Baker, S., Kanade, T.: Hallucinating faces. In Proceedings Fourth IEEE international conference on automatic face and gesture recognition, pp. 83–88 (2000). https://doi.org/10.1109/AFGR.2000.840616

  19. Huang, H., He, H., Fan, X., et al.: Super-resolution of human face image using canonical correlation analysis. Pattern Recogn. 43(7), 2532–2543 (2010). https://doi.org/10.1016/j.patcog.2010.02.007

    Article  Google Scholar 

  20. Shi, J., Liu, X., Qi, C.: Global consistency, local sparsity and pixel correlation: A unified framework for face hallucination. Pattern Recogn. 47(11), 3520–3534 (2014). https://doi.org/10.1016/j.patcog.2014.04.023

    Article  Google Scholar 

  21. Chen, L., Pan, J., Jiang, J., et al.: Robust face super-resolution via position relation model based on global face context. IEEE Trans. Image Process. 29, 9002–9016 (2020). https://doi.org/10.1109/TIP.2020.3023580

    Article  MathSciNet  Google Scholar 

  22. Jiang, J., Chen, C., Ma, J., et al.: SRLSP: A face image super-resolution algorithm using smooth regression with local structure prior. IEEE Trans. Multimedia. 19(1), 27–40 (2016). https://doi.org/10.1109/TMM.2016.2601020

    Article  Google Scholar 

  23. Dong, C., Loy, C.C., He, K., et al.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). https://doi.org/10.1109/TPAMI.2015.2439281

    Article  Google Scholar 

  24. Zhang, Y., Tian, Y., Kong, Y., et al.: Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2472–2481 (2018). https://doi.org/10.1109/TPAMI.2020.2968521

  25. Han, W., Chang, S., Liu, D., et al.: Image super-resolution via dual-state recurrent networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1654–1663 (2018). https://doi.org/10.1109/CVPR.2018.00178

  26. Zheng, H., Wang, X., Gao, X.: Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–731 (2018). https://doi.org/10.1109/CVPR.2018.00082

  27. Liu, J., Tang, J., Wu, G.: Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision, pp. 41–55 (2020). https://doi.org/10.1007/978-3-030-67070-2_2

  28. Yang, L., Liu, C., Wang, P., et al.: HiFaceGAN: Face renovation via collaborative suppression and replenishment. In Proceedings of the ACM International Conference on Multimedia, pp. 1551–1560 (2020). https://doi.org/10.1145/3394171.3413965

  29. Zhang, M., Ling, Q.: Supervised pixel-wise GAN for face super-resolution. IEEE Trans. Multimedia. 23, 1938–1950 (2021). https://doi.org/10.1109/TMM.2020.3006414

    Article  Google Scholar 

  30. Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of 12th international conference on pattern recognition, 1, pp. 582–585 (1994). https://doi.org/10.1109/ICPR.1994.576366

  31. Shi, W., Caballero, J., Huszár, F., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883 (2016)

  32. Liu, Z., Luo, P., Wang, X., et al.: Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp. 3730–3738 (2015)

  33. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401–4410 (2019)

  34. Le, V., Brandt, J., Lin, Z., et al.: Interactive facial feature localization. In 12th European Conference on Computer Vision, Part III 12, pp. 679–692 (2012). https://doi.org/10.1007/978-3-642-33712-3_49

  35. Wang, Z., Bovik, A.C.: A universal image quality index. IEEE. Signal. Process. Lett. 9(3), 81–84 (2002). https://doi.org/10.1109/97.995823

    Article  Google Scholar 

  36. Zhang, R., Isola, P., Efros, et al.: The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595 (2018)

  37. Dong, C., Loy, C.C., He, K., et al.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015). https://doi.org/10.1109/TPAMI.2015.2439281

    Article  Google Scholar 

  38. Lim, B., Son, S., Kim, H., et al.: Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144 (2017)

  39. Lu, T., Wang, Y., Zhang, Y., et al.: Face hallucination via split-attention in split-attention network. In Proceedings of the 29th ACM international conference on multimedia, pp. 5501–5509 (2021). https://doi.org/10.1145/3474085.3475682

  40. Wang, Q., Wu, B., Zhu, P., et al.: ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542 (2020)

  41. Woo, S., Park, J., Lee, J., et al.: CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)

  42. Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146–3154 (2019)

Download references

Author information

Authors and Affiliations

Authors

Contributions

XL contributed to conceptualization and methodology. XG was involved in methodology and manuscript writing and provided software. YC contributed to data interpretation and manuscript editing. GC was involved in validation and manuscript editing. TY and YC contributed to supervision and manuscript editing.

Corresponding author

Correspondence to **ng Gao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Gao, X., Chen, Y. et al. PEFormer: a pixel-level enhanced CNN-transformer hybrid network for face image super-resolution. SIViP (2024). https://doi.org/10.1007/s11760-024-03395-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03395-8

Keywords

Navigation