Abstract
With the continuous popularization of smartphones and their ever-evolving photographic capabilities, individuals can easily take a large number of photos in their daily lives, creating a natural impetus for image editing. With the ability of style-based GAN, images can be reasonably edited on specific semantics by manipulating in latent space of the generator, particularly for human facial photographs. However, such methods are heavily rely on the datasets with diverse data and rich semantic annotations at the same time. Unfortunately, there is no such dataset for outdoor scenes with diverse and complex structural content, which makes current editing methods almost ineffective. To overcome these challenges, we first construct an extensive synthetic outdoor scene dataset with fine-grained semantic annotations based on an automated process. Based on it, we propose an editing network dedicated to multi-class annotations that can efficiently edit specific attributes while preserving others as much as possible. Extensive experiments evince that our method achieves better performance in outdoor scene editing, especially in regards to distance and viewpoint across several outdoor scene datasets.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16385-8/MediaObjects/11042_2023_16385_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16385-8/MediaObjects/11042_2023_16385_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16385-8/MediaObjects/11042_2023_16385_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16385-8/MediaObjects/11042_2023_16385_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16385-8/MediaObjects/11042_2023_16385_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16385-8/MediaObjects/11042_2023_16385_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16385-8/MediaObjects/11042_2023_16385_Fig7_HTML.png)
Similar content being viewed by others
Availability of data and materials
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request, see Section 3 for more details.
References
Abdal R, Qin Y, Wonka P (2019) Image2stylegan: how to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 4432–4441
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. Preprint at http://arxiv.org/abs/2003.04297
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3213–3223
Gatys, LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2414–2423
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems 27, vol 27. pp 2672–2680
Härkönen E, Hertzmann A, Lehtinen J, Paris S (2020) Ganspace: Discovering interpretable gan controls. Adv Neural Inf Proces Syst 33
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1125–1134
Jahanian A, Chai L, Isola P (2020) On the “steerability” of generative adversarial networks. In: International Conference on Learning Representations
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4401–4410
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8110–8119
Laffont P-Y, Ren Z, Tao X, Qian C, Hays J (2014) Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans Graph 33(4):1–11
Northcutt CG, Athalye A, Mueller J (2021) Pervasive label errors in test sets destabilize machine learning benchmarks. Preprint at http://arxiv.org/abs/2103.14749
Park T, Liu M-Y, Wang T-C, Zhu J-Y (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 2337–2346
Patashnik O, Wu Z, Shechtman E, Cohen-Or D, Lischinski D (2021) Styleclip: text-driven manipulation of stylegan imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 2085–2094
Patterson G, Hays J (2012) Sun attribute database: discovering, annotating, and recognizing scene attributes. In: IEEE Conference on Computer Vision and Pattern Recognition. pp 2751–2758
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning. pp 8748–8763
Richter SR, Hayder Z, Koltun V (2017) Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2213–2222
Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3234–3243
Shen Y, Gu J, Tang X, Zhou B (2020) Interpreting the latent space of gans for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9243–9252
Shen Y, Zhou B (2021) Closed-form factorization of latent semantics in gans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B et al (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 2446–2454
Thompson WB, Shirley P, Ferwerda JA (2002) A spatial post-processing algorithm for images of night scenes. J Graphics Tools 7(1):1–12
Tov O, Alaluf Y, Nitzan Y, Patashnik O, Cohen-Or D (2021) Designing an encoder for stylegan image manipulation. ACM Trans Graph 40(4):1–14
**ang S, Fu Y, You G, Liu T (2020) Unsupervised domain adaptation through synthesis for person re-identification. In: IEEE International Conference on Multimedia and Expo. pp 1–6
**e, M., Liu, T., Fu, Y.: Gos: A large-scale annotated outdoor scene synthetic dataset. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3244–3248 (2022)
**e M, **ang S, Wang F, Liu T, Fu Y (2022) Spatial attention guided local facial attribute editing. 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 01–06
Yang C, Shen Y, Zhou B (2021) Semantic hierarchy emerges in deep generative representations for scene synthesis. Int J Comput Vis 1–16
Yao S, Hsu TMH, Zhu J-Y, Wu J, Torralba A, Freeman B, Tenenbaum J (2018) 3D-aware scene manipulation via inverse graphics. Adv Neural Inf Proces Syst 31:1887–1898
Yu F, Seff A, Zhang Y, Song S, Funkhouser T, **ao J (2015) Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. Preprint at http://arxiv.org/abs/1506.03365
Zamir AR, Shah M (2014) Image geo-localization based on multiplenearest neighbor feature matching usinggeneralized graphs. IEEE Trans Pattern Anal Mach Intell 36(8):1546–1558
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Funding
This work was supported by the National Natural Science Foundation of China (Grant No.61977045).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
**e, M., Liu, Z., **ang, S. et al. Editing outdoor scenes with a large annotated synthetic dataset. Multimed Tools Appl 83, 22837–22854 (2024). https://doi.org/10.1007/s11042-023-16385-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16385-8