Abstract
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing. However, in order to manipulate a real-world image, one first needs to be able to retrieve its corresponding latent representation in StyleGAN’s latent space that is decoded to an image as close as possible to the desired image. For many real-world images, a latent representation does not exist, which necessitates the tuning of the generator network. We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator’s weights, resulting in almost perfect inversion, while still allowing image editing, by kee** the rest of the map** between an input latent representation tensor and an output image relatively intact. The method is based on a one-shot training of a set of shallow update networks (aka. Gradient Modification Modules) that modify the layers of the generator. After training the Gradient Modification Modules, a modified generator is obtained by a single application of these networks to the original parameters, and the previous editing capabilities of the generator are maintained. Our experiments show a sizable gap in performance over the current state of the art in this very active domain. Our code is available at https://github.com/sheffier/gani.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: how to embed images into the stylegan latent space? In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4431–4440 (2019)
Abdal, R., Zhu, P., Mitra, N.J., Wonka, P.: Styleflow: attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Trans. Graph. (TOG) 40(3), 1–21 (2021)
Alaluf, Y., Patashnik, O., Cohen-Or, D., et al.: Restyle: a residual-based stylegan encoder via iterative refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Alaluf, Y., Tov, O., Mokady, R., Gal, R., Bermano, A.H.: Hyperstyle: stylegan inversion with hypernetworks for real image editing. ar**v preprint ar**v:2111.15666 (2021)
Bai, Q., Xu, Y., Zhu, J., **a, W., Yang, Y., Shen, Y.: High-fidelity gan inversion with padding space. Ar**v abs/2203.11105 (2022)
Bermano, A.H., et al.: State-of-the-art in the architecture, methods and applications of stylegan. ar**v preprint ar**v:2202.14020 (2022)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. Adv. Neural Inf. Process. Syst. 29 (2016)
Chen, X., Fan, H., Girshick, R.B., He, K.: Improved baselines with momentum contrastive learning. Ar**v abs/2003.04297 (2020)
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Creswell, A., Bharath, A.A.: Inverting the generator of a generative adversarial network. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 1967–1974 (2018)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Ding, X., Wang, Y., Xu, Z., Welch, W.J., Wang, Z.J.: Ccgan: continuous conditional generative adversarial networks for image generation. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=PrzjugOsDeE
Dinh, T.M., Tran, A., Nguyen, R.H.M., Hua, B.S.: Hyperinverter: improving stylegan inversion via hypernetwork. Ar**v abs/2112.00719 (2021)
Feng, Q., Shah, V., Gadde, R., Perona, P., Martinez, A.: Near perfect gan inversion. Ar**v abs/2202.11833 (2022)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)
Guan, S., Tai, Y., Ni, B., Zhu, F., Huang, F., Yang, X.: Collaborative learning for faster stylegan embedding. ar**v preprint ar**v:2007.01758 (2020)
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: Ganspace: discovering interpretable gan controls. Adv. Neural Inf. Process. Syst. 33, 9841–9850 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Identity map**s in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Huang, Y., et al.: Curricularface: adaptive curriculum learning loss for deep face recognition. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5901–5910 (2020)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. ar**v preprint ar**v:1710.10196 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: towards diverse and interactive facial image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Liang, H., Hou, X., Shen, L.: Ssflow: style-guided neural spline flows for face image manipulation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 79–87 (2021)
Ling, H., Kreis, K., Li, D., Kim, S.W., Torralba, A., Fidler, S.: Editgan: high-precision semantic image editing. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Luo, J., Xu, Y., Tang, C., Lv, J.: Learning inverse map** by autoencoder based generative adversarial nets. In: International Conference on Neural Information Processing, pp. 207–216. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-70096-0_22
Maas, A.L., Hannun, A.Y., Ng, A.Y., et al.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, p. 3. Citeseer (2013)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. ar**v preprint ar**v:1411.1784 (2014)
Mitchell, E., Lin, C., Bosselut, A., Finn, C., Manning, C.D.: Fast model editing at scale. CoRR (2021). https://arxiv.org/pdf/2110.11309.pdf
Nguyen, T.Q., Salazar, J.: Transformers without tears: improving the normalization of self-attention. ar**v preprint ar**v:1910.05895 (2019)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans. In: International Conference on Machine Learning, pp. 2642–2651. PMLR (2017)
Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104–14113 (2020)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. ar**v preprint ar**v:1511.06434 (2015)
Richardson, E., et al.: Encoding in style: a stylegan encoder for image-to-image translation. ar**v preprint ar**v:2008.00951 (2020)
Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal tuning for latent-based editing of real images. ar**v preprint ar**v:2106.05744 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Rotman, M., Dekel, A., Gur, S., Oz, Y., Wolf, L.: Unsupervised disentanglement with tensor product representations on the torus. In: International Conference on Learning Representations (2022)
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of gans for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in gans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1532–1540 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
Tewari, A., et al.: Stylerig: Rigging stylegan for 3D control over portrait images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6142–6151 (2020)
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for stylegan image manipulation. ACM Trans. Graph. (TOG) 40(4), 1–14 (2021)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. ar**v preprint ar**v:1607.08022 (2016)
Wang, R., et al.: Attribute-specific control units in stylegan for fine-grained image manipulation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 926–934 (2021)
Wang, X., et al.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European conference on computer vision (ECCV) workshops (2018)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402. IEEE (2003)
Wright, L.: Ranger - a synergistic optimizer (2019). http://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
Yao, X., Newson, A., Gousseau, Y., Hellier, P.: Feature-style encoder for style-based gan inversion. Ar**v abs/2202.02183 (2022)
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., **ao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. ar**v preprint ar**v:1506.03365 (2015)
Zhang, L., Bai, X., Gao, Y.: Sals-gan: spatially-adaptive latent space in stylegan for real image embedding. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 5176–5184 (2021)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image editing. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 592–608. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_35
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Acknowledgements
This project has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research, innovation program (grant ERC CoG 725974).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Additional Qualitative Results
A Additional Qualitative Results
(See Figs. 10, 11, 12 and 13).
Additional Pose, smile and age editing using InterFaceGAN [42] of images from the CelebA-HQ dataset.
Additional visual comparison of editing quality (Stanford Cars). Edits were obtained using GANSpace [18]
Visual comparison of editing quality (LSUN Horses). Edits were obtained using GANSpace [18]
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sheffi, E., Rotman, M., Wolf, L. (2023). Gradient Adjusting Networks for Domain Inversion. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13886. Springer, Cham. https://doi.org/10.1007/978-3-031-31438-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-31438-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31437-7
Online ISBN: 978-3-031-31438-4
eBook Packages: Computer ScienceComputer Science (R0)