Log in

Style spectroscope: improve interpretability and controllability through Fourier analysis

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Universal style transfer (UST) infuses styles from arbitrary reference images into content images. Existing methods, while enjoying many practical successes, are unable of explaining experimental observations, including different performances of UST algorithms in preserving the spatial structure of content images. In addition, methods are limited to cumbersome global controls on stylization, so that they require additional spatial masks for desired stylization. In this work, we first provide a systematic Fourier analysis on a general framework for UST. We present an equivalent form of the framework in the frequency domain. The form implies that existing algorithms treat all frequency components and pixels of feature maps equally, except for the zero-frequency component. We connect Fourier amplitude and phase with a widely used style loss and a well-known content reconstruction loss in style transfer, respectively. Based on such equivalence and connections, we can thus interpret different structure preservation behaviors between algorithms with Fourier phase. Given the interpretations, we propose two plug-and-play manipulations upon style transfer methods for better structure preservation and desired stylization. Both qualitative and quantitative experiments demonstrate the improved performance of our manipulations upon mainstreaming methods without any additional training. Specifically, the metrics are improved by 6% in average on the content images from MS-COCO dataset and the style images from WikiArt dataset. We also conduct experiments to demonstrate (1) the abovementioned equivalence, (2) the interpretability based on Fourier amplitude and phase and (3) the controllability associated with frequency components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

All used data is publicly available.

Code availability

The code will be available after this paper is accepted.

References

  • Chen, D., Yuan, L., Liao, J., Yu, N., & Hua, G. (2017). Stylebank: An explicit representation for neural image style transfer. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2770–2779). https://doi.org/10.1109/ICCV48922.2021.01461

    Article  Google Scholar 

  • Chiu, T.-Y., & Gurari, D. (2022). Photowct 2: Compact autoencoder for photorealistic style transfer resulting from blockwise training and skip connections of high-frequency residuals. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 2978–2987). https://doi.org/10.1109/WACV51458.2022.00303.

  • Deng, Y., Tang, F., Dong, W., Huang, H., Ma, C., & Xu, C. (2021). Arbitrary video style transfer via multi-channel correlation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1210–1217). https://doi.org/10.48550/ar**v.2009.08003.

  • Dumoulin, V., Shlens, J., & Kudlur, M. (2017). Learned representation for artistic style. In International conference on learning representations. https://doi.org/10.48550/ar**v.1610.07629.

  • Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.265

    Article  Google Scholar 

  • Gonzalez, R. C., & Woods, R. E. (2008). Digital image processing (pp. 286–306). Prentice Hall, Upper Saddle River. http://www.amazon.com/Digital-Image-Processing-3rd-Edition/dp/013168728X.

  • Hong, K., Jeon, S., Yang, H., Fu, J., & Byun, H. (2021). Domain-aware universal style transfer. In Proceedings of the international conference on computer vision (ICCV) (pp. 14609–14617). https://doi.org/10.48550/ar**v.2108.04441.

  • Huang, X., & Belongie, S. (2017a). Arbitrary style transfer in real-time with adaptive instance normalization. In International conference on computer vision (ICCV) (pp. 1510–1519). https://doi.org/10.1109/ICCV.2017.167.

  • Huang, X., & Belongie, S. (2017b). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision (pp. 1501–1510). https://doi.org/10.48550/ar**v.1703.06868.

  • Huo, J., **, S., Li, W., Wu, J., Lai, Y.-K., Shi, Y., & Gao, Y. (2021). Manifold alignment for semantically aligned style transfer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14861–14869). https://doi.org/10.48550/ar**v.2005.10777.

  • Jenkins, W. F., & Desai, M. D. (1986). The discrete frequency Fourier transform. IEEE Transactions on Circuits and Systems, 33, 732–734. https://doi.org/10.1109/TCS.1986.1085978

    Article  Google Scholar 

  • Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694–711). https://doi.org/10.48550/ar**v.1603.08155.

  • Li, C., & Wand, M. (2016). Combining Markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2479–2486). https://doi.org/10.48550/ar**v.1601.04589.

  • Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.-H. (2017a). Diversified texture synthesis with feed-forward networks. In IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.48550/ar**v.1703.01664.

  • Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.-H. (2017b). Universal style transfer via feature transforms. In Advances in neural information processing systems (pp. 386–396). https://doi.org/10.48550/ar**v.1705.08086.

  • Li, Y., Liu, M.-Y., Li, X., Yang, M.-H., & Kautz, J. (2018). A closed-form solution to photorealistic image stylization. In Proceedings of the European conference on computer vision (ECCV). https://doi.org/10.48550/ar**v.1802.06474.

  • Li, X., Liu, S., Kautz, J., & Yang, M.-H. (2019). Learning linear transformations for fast image and video style transfer. In Conference on computer vision and pattern recognition (CVPR) (pp. 3804–3812). https://doi.org/10.1109/CVPR.2019.00393.

  • Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollár, P. (2014). Microsoft COCO: Common objects in context. ar**v:1405.0312.

  • Lu, M., Zhao, H., Yao, A., Chen, Y., Xu, F., & Zhang, L. (2019). A closed-form solution to universal style transfer. In International conference on computer vision (ICCV) (pp. 5951–5960). https://doi.org/10.48550/ar**v.1906.00668.

  • Nichol, K. (2016). Painter by numbers (Vol. 34). https://www.kaggle.com/c/painter-by-numbers.

  • Park, D. Y., & Lee, K. H. (2019). Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE international conference on computer vision (pp. 5880–5888). https://doi.org/10.48550/ar**v.1812.02342.

  • Sheng, L., Lin, Z., Shao, J., & Wang, X. (2018). Avatar-Net: Multi-scale Zero-shot style transfer by feature decoration. ar**v:1805.03857.

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  • Wang, Z., Zhao, L., Chen, H., Qiu, L., Mo, Q., Lin, S., **ng, W., & Lu, D. (2020). Diversified arbitrary style transfer via deep feature perturbation. In Proceedings of the IEEE international conference on computer vision (pp. 7789–7798). https://doi.org/10.48550/ar**v.1909.08223.

  • Yoo, J., Uh, Y., Chun, S., Kang, B., & Ha, J.-W. (2019). Photorealistic style transfer via wavelet transforms. In: International conference on computer vision (ICCV) (pp. 9035–9044). https://doi.org/10.1109/ICCV.2019.00913.

  • Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. CoRR. https://doi.org/10.48550/ar**v.1801.03924

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 62176061), STCSM project (No. 22511105000), UniDT's Cognitive Computing and Few Shot Learning Project, and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.

Author information

Authors and Affiliations

Authors

Contributions

Zhiyu ** and Xuli Shen mainly conducted experiments and wrote this manuscript. Bin Li guided the design of method and experiments. **angyang Xue provided suggestions for method improvement. All authors read and approved this manuscript.

Corresponding author

Correspondence to Bin Li.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Not applicable.

Consent to participate

Written informed consent was obtained from individual or guardian participants.

Consent for publication

Not applicable.

Additional information

Editors: Vu Nguyen, Dani Yogatama.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

**, Z., Shen, X., Li, B. et al. Style spectroscope: improve interpretability and controllability through Fourier analysis. Mach Learn 113, 3485–3503 (2024). https://doi.org/10.1007/s10994-023-06435-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-023-06435-5

Keywords

Navigation