Vector Quantized Image-to-Image Translation

Chen, Yu-Jie; Cheng, Shin-I; Chiu, Wei-Chen; Tseng, Hung-Yu; Lee, Hsin-Ying

doi:10.1007/978-3-031-19787-1_25

Yu-Jie Chen^12,13,
Shin-I Cheng^12,13,
Wei-Chen Chiu ORCID: orcid.org/0000-0001-7715-8306^12,13,
Hung-Yu Tseng¹⁴ &
…
Hsin-Ying Lee¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13676))

Included in the following conference series:

European Conference on Computer Vision

2567 Accesses

Abstract

Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the conditional contexts. In this work, we propose introducing the vector quantization technique into the image-to-image translation framework. The vector quantized content representation can facilitate not only the translation, but also the unconditional distribution shared among different domains. Meanwhile, along with the disentangled style representation, the proposed method further enables the capability of image extension with flexibility in both intra- and inter-domains. Qualitative and quantitative experiments demonstrate that our framework achieves comparable performance to the state-of-the-art image-to-image translation and image extension methods. Compared to methods for individual tasks, the proposed method, as a unified framework, unleashes applications combining image-to-image translation, unconditional generation, and image extension altogether. For example, it provides style variability for image generation and extension, and equips image-to-image translation with further extension capabilities.

Y.-J. Chen and S.-I. Cheng—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised Structure-Consistent Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

References

Chen, M., et al.: Generative pretraining from pixels. In: International Conference on Machine Learning (ICML) (2020)
Google Scholar
Cheng, Y.-C., Lee, H.-Y., Sun, M., Yang, M.-H.: Controllable image synthesis via SegVAE. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 159–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_10
Chapter Google Scholar
Cheng, Y.C., Lin, C.H., Lee, H.Y., Ren, J., Tulyakov, S., Yang, M.H.: InOut: diverse image outpainting via GAN inversion. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN V2: diverse image synthesis for multiple domains. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., Jiao, J.: Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Han, L., et al.: Show me what and tell me how: video synthesis via multimodal conditioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)
Google Scholar
Huang, H.-P., Tseng, H.-Y., Lee, H.-Y., Huang, J.-B.: Semantic view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 592–608. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_35
Chapter Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Chapter Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (ICLR) (2013)
Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_3
Chapter Google Scholar
Lee, H.Y., et al.: DRIT++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vis. (IJCV) 128, 2402–2417 (2020)
Article Google Scholar
Lin, C.H., Lee, H.Y., Cheng, Y.C., Tulyakov, S., Yang, M.H.: InfinityGAN: towards infinite-pixel image synthesis. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)
Google Scholar
Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., Yang, M.H.: Mode seeking generative adversarial networks for diverse image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Mao, Q., Tseng, H.Y., Lee, H.Y., Huang, J.B., Ma, S., Yang, M.H.: Continuous and diverse image-to-image translation via signed attribute vectors. Int. J. Comput. Vis. (IJCV) 130, 517–549 (2022)
Article Google Scholar
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind’’ image quality analyzer. IEEE Signal Process. Lett. 20, 209–212 (2012)
Article Google Scholar
Murez, Z., Kolouri, S., Kriegman, D., Ramamoorthi, R., Kim, K.: Image to image translation for domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
van den Oord, A., Kalchbrenner, N.: Pixel RNN. In: International Conference on Machine Learning (ICML) (2016)
Google Scholar
van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Conditional image generation with pixelCNN decoders. In: Advances in Neural Information Processing Systems (NeurIPS) (2016)
Google Scholar
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)
Google Scholar
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Chapter Google Scholar
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Google Scholar
Teterwak, P., et al.: Boundless: generative adversarial networks for image extension. In: IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Tseng, H.-Y., Lee, H.-Y., Jiang, L., Yang, M.-H., Yang, W.: RetrieveGAN: image synthesis via differentiable patch retrieval. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 242–257. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_15
Chapter Google Scholar
Zhang, Z., et al.: UFC-BERT: unifying multi-modal controls for conditional image synthesis (2021)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Zhu, J.Y., et al.: Multimodal image-to-image translation by enforcing bi-cycle consistency. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)
Google Scholar
Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)
Google Scholar

Download references

Acknowledgement

This project is supported by MediaTek Inc., MOST (Ministry of Science and Technology, Taiwan) 111-2636-E-A49-003 and 111-2628-E-A49-018-MY4. We are grateful to the National Center for High-performance Computing for computer time and facilities.

Author information

Authors and Affiliations

National Chiao Tung University, Hsinchu, Taiwan
Yu-Jie Chen, Shin-I Cheng & Wei-Chen Chiu
MediaTek-NCTU Research Center, Hsinchu, Taiwan
Yu-Jie Chen, Shin-I Cheng & Wei-Chen Chiu
Meta, Menlo Park, USA
Hung-Yu Tseng
Snap Inc., Santa Monica, USA
Hsin-Ying Lee

Authors

Yu-Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shin-I Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Chen Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Yu Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Hsin-Ying Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Chen Chiu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18634 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, YJ., Cheng, SI., Chiu, WC., Tseng, HY., Lee, HY. (2022). Vector Quantized Image-to-Image Translation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-19787-1_25
Published: 21 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19786-4
Online ISBN: 978-3-031-19787-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Vector Quantized Image-to-Image Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Structure-Consistent Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 18634 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Vector Quantized Image-to-Image Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Structure-Consistent Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 18634 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation