Style spectroscope: improve interpretability and controllability through Fourier analysis

**, Zhiyu; Shen, Xuli; Li, Bin; Xue, **angyang

doi:10.1007/s10994-023-06435-5

Style spectroscope: improve interpretability and controllability through Fourier analysis

Published: 09 January 2024

Volume 113, pages 3485–3503, (2024)
Cite this article

Machine Learning Aims and scope Submit manuscript

Zhiyu **¹,
Xuli Shen¹,
Bin Li ORCID: orcid.org/0000-0002-9633-0033¹ &
…
**angyang Xue¹

166 Accesses
1 Altmetric
Explore all metrics

Abstract

Universal style transfer (UST) infuses styles from arbitrary reference images into content images. Existing methods, while enjoying many practical successes, are unable of explaining experimental observations, including different performances of UST algorithms in preserving the spatial structure of content images. In addition, methods are limited to cumbersome global controls on stylization, so that they require additional spatial masks for desired stylization. In this work, we first provide a systematic Fourier analysis on a general framework for UST. We present an equivalent form of the framework in the frequency domain. The form implies that existing algorithms treat all frequency components and pixels of feature maps equally, except for the zero-frequency component. We connect Fourier amplitude and phase with a widely used style loss and a well-known content reconstruction loss in style transfer, respectively. Based on such equivalence and connections, we can thus interpret different structure preservation behaviors between algorithms with Fourier phase. Given the interpretations, we propose two plug-and-play manipulations upon style transfer methods for better structure preservation and desired stylization. Both qualitative and quantitative experiments demonstrate the improved performance of our manipulations upon mainstreaming methods without any additional training. Specifically, the metrics are improved by 6% in average on the content images from MS-COCO dataset and the style images from WikiArt dataset. We also conduct experiments to demonstrate (1) the abovementioned equivalence, (2) the interpretability based on Fourier amplitude and phase and (3) the controllability associated with frequency components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Controllable Feature-Preserving Style Transfer

Adaptive Style Modulation for Artistic Style Transfer

Article 31 December 2022

Computational Decomposition of Style for Controllable and Enhanced Style Transfer

Availability of data and materials

All used data is publicly available.

Code availability

The code will be available after this paper is accepted.

References

Chen, D., Yuan, L., Liao, J., Yu, N., & Hua, G. (2017). Stylebank: An explicit representation for neural image style transfer. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2770–2779). https://doi.org/10.1109/ICCV48922.2021.01461
Article Google Scholar
Chiu, T.-Y., & Gurari, D. (2022). Photowct 2: Compact autoencoder for photorealistic style transfer resulting from blockwise training and skip connections of high-frequency residuals. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 2978–2987). https://doi.org/10.1109/WACV51458.2022.00303.
Deng, Y., Tang, F., Dong, W., Huang, H., Ma, C., & Xu, C. (2021). Arbitrary video style transfer via multi-channel correlation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1210–1217). https://doi.org/10.48550/ar**v.2009.08003.
Dumoulin, V., Shlens, J., & Kudlur, M. (2017). Learned representation for artistic style. In International conference on learning representations. https://doi.org/10.48550/ar**v.1610.07629.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.265
Article Google Scholar
Gonzalez, R. C., & Woods, R. E. (2008). Digital image processing (pp. 286–306). Prentice Hall, Upper Saddle River. http://www.amazon.com/Digital-Image-Processing-3rd-Edition/dp/013168728X.
Hong, K., Jeon, S., Yang, H., Fu, J., & Byun, H. (2021). Domain-aware universal style transfer. In Proceedings of the international conference on computer vision (ICCV) (pp. 14609–14617). https://doi.org/10.48550/ar**v.2108.04441.
Huang, X., & Belongie, S. (2017a). Arbitrary style transfer in real-time with adaptive instance normalization. In International conference on computer vision (ICCV) (pp. 1510–1519). https://doi.org/10.1109/ICCV.2017.167.
Huang, X., & Belongie, S. (2017b). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision (pp. 1501–1510). https://doi.org/10.48550/ar**v.1703.06868.
Huo, J., **, S., Li, W., Wu, J., Lai, Y.-K., Shi, Y., & Gao, Y. (2021). Manifold alignment for semantically aligned style transfer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14861–14869). https://doi.org/10.48550/ar**v.2005.10777.
Jenkins, W. F., & Desai, M. D. (1986). The discrete frequency Fourier transform. IEEE Transactions on Circuits and Systems, 33, 732–734. https://doi.org/10.1109/TCS.1986.1085978
Article Google Scholar
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694–711). https://doi.org/10.48550/ar**v.1603.08155.
Li, C., & Wand, M. (2016). Combining Markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2479–2486). https://doi.org/10.48550/ar**v.1601.04589.
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.-H. (2017a). Diversified texture synthesis with feed-forward networks. In IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.48550/ar**v.1703.01664.
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.-H. (2017b). Universal style transfer via feature transforms. In Advances in neural information processing systems (pp. 386–396). https://doi.org/10.48550/ar**v.1705.08086.
Li, Y., Liu, M.-Y., Li, X., Yang, M.-H., & Kautz, J. (2018). A closed-form solution to photorealistic image stylization. In Proceedings of the European conference on computer vision (ECCV). https://doi.org/10.48550/ar**v.1802.06474.
Li, X., Liu, S., Kautz, J., & Yang, M.-H. (2019). Learning linear transformations for fast image and video style transfer. In Conference on computer vision and pattern recognition (CVPR) (pp. 3804–3812). https://doi.org/10.1109/CVPR.2019.00393.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollár, P. (2014). Microsoft COCO: Common objects in context. ar**v:1405.0312.
Lu, M., Zhao, H., Yao, A., Chen, Y., Xu, F., & Zhang, L. (2019). A closed-form solution to universal style transfer. In International conference on computer vision (ICCV) (pp. 5951–5960). https://doi.org/10.48550/ar**v.1906.00668.
Nichol, K. (2016). Painter by numbers (Vol. 34). https://www.kaggle.com/c/painter-by-numbers.
Park, D. Y., & Lee, K. H. (2019). Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE international conference on computer vision (pp. 5880–5888). https://doi.org/10.48550/ar**v.1812.02342.
Sheng, L., Lin, Z., Shao, J., & Wang, X. (2018). Avatar-Net: Multi-scale Zero-shot style transfer by feature decoration. ar**v:1805.03857.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861
Article Google Scholar
Wang, Z., Zhao, L., Chen, H., Qiu, L., Mo, Q., Lin, S., **ng, W., & Lu, D. (2020). Diversified arbitrary style transfer via deep feature perturbation. In Proceedings of the IEEE international conference on computer vision (pp. 7789–7798). https://doi.org/10.48550/ar**v.1909.08223.
Yoo, J., Uh, Y., Chun, S., Kang, B., & Ha, J.-W. (2019). Photorealistic style transfer via wavelet transforms. In: International conference on computer vision (ICCV) (pp. 9035–9044). https://doi.org/10.1109/ICCV.2019.00913.
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. CoRR. https://doi.org/10.48550/ar**v.1801.03924

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 62176061), STCSM project (No. 22511105000), UniDT's Cognitive Computing and Few Shot Learning Project, and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.

Author information

Authors and Affiliations

School of Computer Science, Fudan University, 220 Handan Rd., Shanghai, 200433, China
Zhiyu **, Xuli Shen, Bin Li & **angyang Xue

Authors

Zhiyu **
View author publications
You can also search for this author in PubMed Google Scholar
Xuli Shen
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
**angyang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Zhiyu ** and Xuli Shen mainly conducted experiments and wrote this manuscript. Bin Li guided the design of method and experiments. **angyang Xue provided suggestions for method improvement. All authors read and approved this manuscript.

Corresponding author

Correspondence to Bin Li.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Not applicable.

Consent to participate

Written informed consent was obtained from individual or guardian participants.

Consent for publication

Not applicable.

Additional information

Editors: Vu Nguyen, Dani Yogatama.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

**, Z., Shen, X., Li, B. et al. Style spectroscope: improve interpretability and controllability through Fourier analysis. Mach Learn 113, 3485–3503 (2024). https://doi.org/10.1007/s10994-023-06435-5

Download citation

Received: 12 June 2023
Revised: 25 August 2023
Accepted: 07 October 2023
Published: 09 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s10994-023-06435-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Style spectroscope: improve interpretability and controllability through Fourier analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Controllable Feature-Preserving Style Transfer

Adaptive Style Modulation for Artistic Style Transfer

Computational Decomposition of Style for Controllable and Enhanced Style Transfer

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Style spectroscope: improve interpretability and controllability through Fourier analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Controllable Feature-Preserving Style Transfer

Adaptive Style Modulation for Artistic Style Transfer

Computational Decomposition of Style for Controllable and Enhanced Style Transfer

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation