Editing outdoor scenes with a large annotated synthetic dataset

**e, Mingye; Liu, Zongwei; **ang, Suncheng; Liu, Ting; Fu, Yuzhuo

doi:10.1007/s11042-023-16385-8

Editing outdoor scenes with a large annotated synthetic dataset

Published: 09 August 2023

Volume 83, pages 22837–22854, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mingye **e ORCID: orcid.org/0000-0001-9826-9806¹,
Zongwei Liu¹,
Suncheng **ang²,
Ting Liu¹ &
…
Yuzhuo Fu¹

109 Accesses
Explore all metrics

Abstract

With the continuous popularization of smartphones and their ever-evolving photographic capabilities, individuals can easily take a large number of photos in their daily lives, creating a natural impetus for image editing. With the ability of style-based GAN, images can be reasonably edited on specific semantics by manipulating in latent space of the generator, particularly for human facial photographs. However, such methods are heavily rely on the datasets with diverse data and rich semantic annotations at the same time. Unfortunately, there is no such dataset for outdoor scenes with diverse and complex structural content, which makes current editing methods almost ineffective. To overcome these challenges, we first construct an extensive synthetic outdoor scene dataset with fine-grained semantic annotations based on an automated process. Based on it, we propose an editing network dedicated to multi-class annotations that can efficiently edit specific attributes while preserving others as much as possible. Extensive experiments evince that our method achieves better performance in outdoor scene editing, especially in regards to distance and viewpoint across several outdoor scene datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

Self-supervised reflectance-guided 3d shape reconstruction from single-view images

Article 13 July 2022

StyleDisentangle: Disentangled Image Editing Based on StyleGAN2

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request, see Section 3 for more details.

Notes

References

Abdal R, Qin Y, Wonka P (2019) Image2stylegan: how to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 4432–4441
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. Preprint at http://arxiv.org/abs/2003.04297
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3213–3223
Gatys, LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2414–2423
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems 27, vol 27. pp 2672–2680
Härkönen E, Hertzmann A, Lehtinen J, Paris S (2020) Ganspace: Discovering interpretable gan controls. Adv Neural Inf Proces Syst 33
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1125–1134
Jahanian A, Chai L, Isola P (2020) On the “steerability” of generative adversarial networks. In: International Conference on Learning Representations
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4401–4410
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8110–8119
Laffont P-Y, Ren Z, Tao X, Qian C, Hays J (2014) Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans Graph 33(4):1–11
Article Google Scholar
Northcutt CG, Athalye A, Mueller J (2021) Pervasive label errors in test sets destabilize machine learning benchmarks. Preprint at http://arxiv.org/abs/2103.14749
Park T, Liu M-Y, Wang T-C, Zhu J-Y (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 2337–2346
Patashnik O, Wu Z, Shechtman E, Cohen-Or D, Lischinski D (2021) Styleclip: text-driven manipulation of stylegan imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 2085–2094
Patterson G, Hays J (2012) Sun attribute database: discovering, annotating, and recognizing scene attributes. In: IEEE Conference on Computer Vision and Pattern Recognition. pp 2751–2758
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning. pp 8748–8763
Richter SR, Hayder Z, Koltun V (2017) Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2213–2222
Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3234–3243
Shen Y, Gu J, Tang X, Zhou B (2020) Interpreting the latent space of gans for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9243–9252
Shen Y, Zhou B (2021) Closed-form factorization of latent semantics in gans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B et al (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 2446–2454
Thompson WB, Shirley P, Ferwerda JA (2002) A spatial post-processing algorithm for images of night scenes. J Graphics Tools 7(1):1–12
Article CAS Google Scholar
Tov O, Alaluf Y, Nitzan Y, Patashnik O, Cohen-Or D (2021) Designing an encoder for stylegan image manipulation. ACM Trans Graph 40(4):1–14
Article Google Scholar
**ang S, Fu Y, You G, Liu T (2020) Unsupervised domain adaptation through synthesis for person re-identification. In: IEEE International Conference on Multimedia and Expo. pp 1–6
**e, M., Liu, T., Fu, Y.: Gos: A large-scale annotated outdoor scene synthetic dataset. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3244–3248 (2022)
**e M, **ang S, Wang F, Liu T, Fu Y (2022) Spatial attention guided local facial attribute editing. 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 01–06
Yang C, Shen Y, Zhou B (2021) Semantic hierarchy emerges in deep generative representations for scene synthesis. Int J Comput Vis 1–16
Yao S, Hsu TMH, Zhu J-Y, Wu J, Torralba A, Freeman B, Tenenbaum J (2018) 3D-aware scene manipulation via inverse graphics. Adv Neural Inf Proces Syst 31:1887–1898
Google Scholar
Yu F, Seff A, Zhang Y, Song S, Funkhouser T, **ao J (2015) Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. Preprint at http://arxiv.org/abs/1506.03365
Zamir AR, Shah M (2014) Image geo-localization based on multiplenearest neighbor feature matching usinggeneralized graphs. IEEE Trans Pattern Anal Mach Intell 36(8):1546–1558
Article PubMed Google Scholar
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Article PubMed Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant No.61977045).

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Mingye **e, Zongwei Liu, Ting Liu & Yuzhuo Fu
School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
Suncheng **ang

Authors

Mingye **e
View author publications
You can also search for this author in PubMed Google Scholar
Zongwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Suncheng **ang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhuo Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingye **e.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

**e, M., Liu, Z., **ang, S. et al. Editing outdoor scenes with a large annotated synthetic dataset. Multimed Tools Appl 83, 22837–22854 (2024). https://doi.org/10.1007/s11042-023-16385-8

Download citation

Received: 19 August 2022
Revised: 19 May 2023
Accepted: 17 July 2023
Published: 09 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16385-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Editing outdoor scenes with a large annotated synthetic dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

Self-supervised reflectance-guided 3d shape reconstruction from single-view images

StyleDisentangle: Disentangled Image Editing Based on StyleGAN2

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Editing outdoor scenes with a large annotated synthetic dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

Self-supervised reflectance-guided 3d shape reconstruction from single-view images

StyleDisentangle: Disentangled Image Editing Based on StyleGAN2

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation