Log in

Editing outdoor scenes with a large annotated synthetic dataset

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the continuous popularization of smartphones and their ever-evolving photographic capabilities, individuals can easily take a large number of photos in their daily lives, creating a natural impetus for image editing. With the ability of style-based GAN, images can be reasonably edited on specific semantics by manipulating in latent space of the generator, particularly for human facial photographs. However, such methods are heavily rely on the datasets with diverse data and rich semantic annotations at the same time. Unfortunately, there is no such dataset for outdoor scenes with diverse and complex structural content, which makes current editing methods almost ineffective. To overcome these challenges, we first construct an extensive synthetic outdoor scene dataset with fine-grained semantic annotations based on an automated process. Based on it, we propose an editing network dedicated to multi-class annotations that can efficiently edit specific attributes while preserving others as much as possible. Extensive experiments evince that our method achieves better performance in outdoor scene editing, especially in regards to distance and viewpoint across several outdoor scene datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request, see Section 3 for more details.

Notes

  1. https://support.rockstargames.com/articles/115009494848/PC-Single-Player-Mods

  2. http://www.dev-c.com/gtav/scripthookv

  3. https://myronxie.github.io/GOS/

  4. https://github.com/umautobots/GTAVisionExport

  5. https://github.com/rosinality/stylegan2-pytorch

References

  1. Abdal R, Qin Y, Wonka P (2019) Image2stylegan: how to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 4432–4441

  2. Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. Preprint at http://arxiv.org/abs/2003.04297

  3. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3213–3223

  4. Gatys, LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2414–2423

  5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems 27, vol 27. pp 2672–2680

  6. Härkönen E, Hertzmann A, Lehtinen J, Paris S (2020) Ganspace: Discovering interpretable gan controls. Adv Neural Inf Proces Syst 33

  7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778

  8. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1125–1134

  9. Jahanian A, Chai L, Isola P (2020) On the “steerability” of generative adversarial networks. In: International Conference on Learning Representations

  10. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4401–4410

  11. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8110–8119

  12. Laffont P-Y, Ren Z, Tao X, Qian C, Hays J (2014) Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans Graph 33(4):1–11

    Article  Google Scholar 

  13. Northcutt CG, Athalye A, Mueller J (2021) Pervasive label errors in test sets destabilize machine learning benchmarks. Preprint at http://arxiv.org/abs/2103.14749

  14. Park T, Liu M-Y, Wang T-C, Zhu J-Y (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 2337–2346

  15. Patashnik O, Wu Z, Shechtman E, Cohen-Or D, Lischinski D (2021) Styleclip: text-driven manipulation of stylegan imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 2085–2094

  16. Patterson G, Hays J (2012) Sun attribute database: discovering, annotating, and recognizing scene attributes. In: IEEE Conference on Computer Vision and Pattern Recognition. pp 2751–2758

  17. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning. pp 8748–8763

  18. Richter SR, Hayder Z, Koltun V (2017) Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2213–2222

  19. Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3234–3243

  20. Shen Y, Gu J, Tang X, Zhou B (2020) Interpreting the latent space of gans for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9243–9252

  21. Shen Y, Zhou B (2021) Closed-form factorization of latent semantics in gans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  22. Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B et al (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 2446–2454

  23. Thompson WB, Shirley P, Ferwerda JA (2002) A spatial post-processing algorithm for images of night scenes. J Graphics Tools 7(1):1–12

    Article  CAS  Google Scholar 

  24. Tov O, Alaluf Y, Nitzan Y, Patashnik O, Cohen-Or D (2021) Designing an encoder for stylegan image manipulation. ACM Trans Graph 40(4):1–14

    Article  Google Scholar 

  25. **ang S, Fu Y, You G, Liu T (2020) Unsupervised domain adaptation through synthesis for person re-identification. In: IEEE International Conference on Multimedia and Expo. pp 1–6

  26. **e, M., Liu, T., Fu, Y.: Gos: A large-scale annotated outdoor scene synthetic dataset. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3244–3248 (2022)

  27. **e M, **ang S, Wang F, Liu T, Fu Y (2022) Spatial attention guided local facial attribute editing. 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 01–06

  28. Yang C, Shen Y, Zhou B (2021) Semantic hierarchy emerges in deep generative representations for scene synthesis. Int J Comput Vis 1–16

  29. Yao S, Hsu TMH, Zhu J-Y, Wu J, Torralba A, Freeman B, Tenenbaum J (2018) 3D-aware scene manipulation via inverse graphics. Adv Neural Inf Proces Syst 31:1887–1898

    Google Scholar 

  30. Yu F, Seff A, Zhang Y, Song S, Funkhouser T, **ao J (2015) Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. Preprint at http://arxiv.org/abs/1506.03365

  31. Zamir AR, Shah M (2014) Image geo-localization based on multiplenearest neighbor feature matching usinggeneralized graphs. IEEE Trans Pattern Anal Mach Intell 36(8):1546–1558

    Article  PubMed  Google Scholar 

  32. Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464

    Article  PubMed  Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant No.61977045).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingye **e.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

**e, M., Liu, Z., **ang, S. et al. Editing outdoor scenes with a large annotated synthetic dataset. Multimed Tools Appl 83, 22837–22854 (2024). https://doi.org/10.1007/s11042-023-16385-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16385-8

Keywords

Navigation