Towards photorealistic face generation using text-guided Semantic-Spatial FaceGAN

Guo, Qi; Gu, **aodong

doi:10.1007/s11042-024-19320-7

Towards photorealistic face generation using text-guided Semantic-Spatial FaceGAN

Published: 15 May 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

61 Accesses
Explore all metrics

Abstract

In this paper, we propose a simple yet effective Text-To-Face (T2F) generative adversarial network named Semantic-Spatial FaceGAN, which addresses the challenge of generating facial images from natural language descriptions. Natural language is inherently abstract, whereas images are concrete. This discrepancy poses a significant challenge, especially when utilizing multiple descriptions to generate accurate images. To overcome this issue, we introduce the Semantic Spatial FaceGAN (SS-FaceGAN) network, capable of generating precise features from multiple descriptions. Additionally, we incorporate a novel Focus Spatial (FS) module that predicts masks based on text semantics to refine image feature map**. We also introduce an attention mechanism, the Word Attention Reuse (WAR) module, which leverages the potential distribution of each word in the description to compute word-level attention. Finally, our experiments demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

A face template: Improving the face generation quality of multi-stage generative adversarial networks using coarse-grained facial priors

Article 29 July 2023

Fine-Grained Face Sketch-Photo Synthesis with Text-Guided Diffusion Models

Generating Distinctive Facial Images from Natural Language Descriptions via Spatial Map Fusion

Data Availability

All data generated or analysed during this study are included in this article.

References

Bai Q, Yang C, Xu Y, Liu X, Yang Y, Shen Y (2023) Glead: Improving gans with a generator-leading task. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 12094–12104
Ben-Yosef M, Weinshall D (2018) Gaussian mixture generative adversarial networks for diverse datasets, and the unsupervised clustering of images. Preprint ar**v:1808.10356
Brock A, Donahue J, Simonyan K (2019) Large, scale gan training for high fidelity natural image. 7th international conference on learning representations (iclr). New Orleans, LA
Dash A, Ye J, Wang G (2023) A review of generative adversarial networks (gans) and its applications in a wide variety of disciplines: From medical to remote sensing. IEEE Access
Deng Q, Cao J, Liu Y, Chai Z, Li Q, Sun Z (2020) Reference-guided face component editing. Preprint ar**v:2006.02051
Doan T, Monteiro J, Albuquerque I, Mazoure B, Durand A, Pineau J, Hjelm RD (2019) On-line adaptative curriculum learning for gans. Proceedings of the aaai conference on artificial intelligence, vol 33, pp 3470–3477
Du X, Peng J, Zhou Y, Zhang J, Chen S, Jiang G, ... Ji R (2023) Pixelface+: Towards controllable face generation and manipulation with text descriptions and segmentation masks. Proceedings of the 31st acm international conference on multimedia, pp 4666–4677
Franceschi J-Y, Gartrell M, Dos Santos L, Issenhuth T, de Bézenac E, Chen M, Rakotomamonjy A (2024) Unifying gans and score-based diffusion as generative particle models. Advances in Neural Information Processing Systems, 36
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, ... Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems, 27
He Z, Zuo W, Kan M, Shan S, Chen X (2019) Attgan: Facial attribute editing by only changing what you want. IEEE Trans Image Process 28(11):5464–5478
Article MathSciNet Google Scholar
Kang M, Zhu J-Y, Zhang R, Park J, Shechtman E, Paris S, Park T (2023) Scaling up gans for text-to-image synthesis. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 10124–10134
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 4401–4410
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 8110–8119
Kim M, Liu F, Jain A, Liu X (2023) Dcface: Synthetic face generation with dual condition diffusion model. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 12715–12725
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. Preprint ar**v:1412.6980
Koley S, Bhunia AK, Sain A, Chowdhury PN, **ang T, Song Y-Z (2023) Picture that sketch: Photorealistic image generation from abstract sketches. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 6850–6861
Lee C-H, Liu Z, Wu L, Luo P (2020) Maskgan: Towards diverse and interactive facial image manipulation. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 5549–5558
Li B, Qi X, Lukasiewicz T, Torr P (2019a) Controllable text-to-image generation. Advances in Neural Information Processing Systems, 32
Li B, Qi X, Lukasiewicz T, Torr P (2019b) Controllable text-to-image generation. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds), Advances in neural information processing systems, vol. 32. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2019/file/1d72310edc006dadf2190caad5802983-Paper.pdf
Liao W, Hu K, Yang MY, Rosenhahn B (2022) Text to image generation with semantic-spatial aware gan. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 18187–18196
Liu C, Hu J, Lin H (2023) Swf-gan: A text-to-image model based on sentence-word fusion perception. Comput Graph 115:500–510
Article Google Scholar
Liu Y, Li Q, Deng Q, Sun Z, Yang M-H (2023) Gan-based facial attribute manipulation. IEEE Trans Pattern Anal Mach Intell
Liu Y, Li Q, Sun Z (2019) Attribute-aware face aging with wavelet-based generative adversarial networks. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 11877–11886
Nasir OR, Jha SK, Grover MS, Yu Y, Kumar A, Shah RR (2019) Text2facegan: Face generation from fine grained textual descriptions. 2019 ieee fifth international conference on multimedia big data (bigmm), pp 58–67
Nguyen V-Q, Suganuma M, Okatani T (2020) Efficient attention mechanism for visual dialog that can handle all the interactions between multiple inputs. European conference on computer vision, pp 223–240
Ning X, Nan F, Xu S, Yu L, Zhang L (2023) Multi-view frontal face image generation: a survey. Concurr Comput Pract Exp 35(18):e6147
Article Google Scholar
Oza M, Chanda S, Doermann D (2021) Semantic text-to-face gan-st \(\hat{}\) 2fg. Preprint ar**v:2107.10756
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. International conference on machine learning, pp 1060–1069
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. Proceedings of the ieee conference on computer vision and pattern recognition, pp 815–823
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Sharma R, Barratt S, Ermon S, Pande V (2018) Improved training with curriculum gans. Preprint ar**v:1807.09295
Song Y, Soleymani M (2019) Polysemous visual-semantic embedding for cross-modal retrieval. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 1979–1988
Sun J, Deng Q, Li Q, Sun M, Liu Y, Sun Z (2024) Anyface++: A unified framework for free-style text-to-face synthesis and manipulation. IEEE Trans Pattern Anal Mach Intell
Sun J, Deng Q, Li Q, Sun M, Ren M, Sun Z (2022) Anyface: Free-style text-to-face synthesis and manipulation. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 18687–18696
Sun J, Li Q, Wang W, Zhao J, Sun Z (2021) Multi-caption text-to-face synthesis: Dataset and algorithm. Proceedings of the 29th acm international conference on multimedia, pp 2290–2298
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. Proceedings of the ieee conference on computer vision and pattern recognition, pp 2818–2826
Tao M, Tang H, Wu S, Sebe N, **g X-Y, Wu F, Bao B (2020) Df-gan: Deep fusion generative adversarial networks for text-to-image synthesis. Preprint ar**v:2008.05865
**a W, Yang Y, Xue J-H, Wu B (2021) Tedigan: Text-guided diverse face image generation and manipulation. 2021 ieee/cvf conference on computer vision and pattern recognition (cvpr), pp 2256–2265. https://doi.org/10.1109/CVPR46437.2021.00229
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: Fine-grained text to image generation with attentional generative adversarial networks. Proceedings of the ieee conference on computer vision and pattern recognition, pp 1316–1324
Yauri-Lozano E, Castillo-Cara M, Orozco-Barbosa L, García-Castro R (2024) Generative adversarial networks for text-to-face synthesis & generation: A quantitative-qualitative analysis of natural language processing encoders for spanish. Inf Process Manag 61(3):103667
Article Google Scholar
Zhan F, Yu Y, Wu R, Zhang J, Lu S, Liu L, ... **ng E (2023) Multimodal image synthesis and editing: The generative ai era
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. International conference on machine learning, pp 7354–7363
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. Proceedings of the ieee international conference on computer vision, pp 5907–5915
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2018) Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962
Article Google Scholar
Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 5802–5810

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under grant 62176062.

Author information

Authors and Affiliations

Department of Electronic Engineering, Fudan University, Shanghai, 200438, China
Qi Guo & **aodong Gu

Authors

Qi Guo
View author publications
You can also search for this author in PubMed Google Scholar
**aodong Gu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Qi Guo: Conceptualization of this study, Methodology, Software,Writing original draft. **aodong Gu: Supervision, Conceptualization and methodology, Writing original draft, Project administration.

Corresponding author

Correspondence to **aodong Gu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, Q., Gu, X. Towards photorealistic face generation using text-guided Semantic-Spatial FaceGAN. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19320-7

Download citation

Received: 24 October 2022
Revised: 06 March 2024
Accepted: 30 April 2024
Published: 15 May 2024
DOI: https://doi.org/10.1007/s11042-024-19320-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Towards photorealistic face generation using text-guided Semantic-Spatial FaceGAN

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A face template: Improving the face generation quality of multi-stage generative adversarial networks using coarse-grained facial priors

Fine-Grained Face Sketch-Photo Synthesis with Text-Guided Diffusion Models

Generating Distinctive Facial Images from Natural Language Descriptions via Spatial Map Fusion

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards photorealistic face generation using text-guided Semantic-Spatial FaceGAN

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A face template: Improving the face generation quality of multi-stage generative adversarial networks using coarse-grained facial priors

Fine-Grained Face Sketch-Photo Synthesis with Text-Guided Diffusion Models

Generating Distinctive Facial Images from Natural Language Descriptions via Spatial Map Fusion

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation