TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments

Dokania, Shubham; Subramanian, Anbumani; Chandraker, Manmohan; Jawahar, C. V.

doi:10.1007/978-3-031-20074-8_34

Shubham Dokania¹²,
Anbumani Subramanian¹²,
Manmohan Chandraker¹³ &
…
C. V. Jawahar¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13668))

Included in the following conference series:

European Conference on Computer Vision

1859 Accesses
1 Citations

Abstract

High-quality structured data with rich annotations are critical components in intelligent vehicle systems dealing with road scenes. However, data curation and annotation require intensive investments and yield low-diversity scenarios. The recently growing interest in synthetic data raises questions about the scope of improvement in such systems and the amount of manual work still required to produce high volumes and variations of simulated data. This work proposes a synthetic data generation pipeline that utilizes existing datasets, like nuScenes, to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation, mimicking real scene properties with high-fidelity, along with mechanisms to diversify samples in a physically meaningful way. We demonstrate improvements in mIoU metrics by presenting qualitative and quantitative experiments with real and synthetic data for semantic segmentation on the Cityscapes and KITTI-STEP datasets. All relevant code and data is released on github\(^{3}\) (https://github.com/shubham1810/trove_toolkit).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 93.08; Price includes VAT (Germany)

Softcover Book: EUR 117.69; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reliving the Dataset: Combining the Visualization of Road Users’ Interactions with Scenario Reconstruction in Virtual Reality

Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes

Article 07 March 2018

Does it work outside this benchmark? Introducing the rigid depth constructor tool

Article 04 April 2023

References

Baidu Apollo team: Apollo: Open Source Autonomous Driving (2017). https://github.com/apolloauto/apollo. Accessed 11 Feb 2022
Alhaija, H., Mustikovela, S., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int. J. Comput. Vis. (IJCV) 126, 961–972 (2018)
Article Google Scholar
Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2810 (2018)
Google Scholar
Behley, J., et al.: Semantickitti: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)
Google Scholar
Cabon, Y., Murray, N., Humenberger, M.: Virtual kitti 2. ar**v preprint ar**v:2001.10773 (2020)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Google Scholar
Cai, P., Lee, Y., Luo, Y., Hsu, D.: SUMMIT: a simulator for urban driving in massive mixed traffic. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4023–4029 (2020). https://doi.org/10.1109/ICRA40945.2020.9197228
CGTrader: 3d model store. https://www.cgtrader.com/
Chang, A.X., et al.: ShapeNet: an information-rich 3d model repository. ar**v preprint ar**v:1512.03012 (2015)
Chang, M.F., et al.: Argoverse: 3d tracking and forecasting with rich maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8748–8757 (2019)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, W., et al.: Contrastive syn-to-real generalization. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=F8whUO8HNbP
Blender Online Community: Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam (2018). http://www.blender.org
PolyHaven Community: 3d model and texture store. https://polyhaven.com/
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Cordts, M., et al.: The cityscapes dataset. In: CVPR Workshop on The Future of Datasets in Vision (2015)
Google Scholar
Denninger, M., et al.: Blenderproc. ar**v preprint ar**v:1911.01911 (2019)
Devaranjan, J., Kar, A., Fidler, S.: Meta-Sim2: unsupervised learning of scene structure for synthetic data generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 715–733. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_42
Chapter Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16. PMLR (2017)
Google Scholar
Falcon, W., et al.: PyTorch lightning. GitHub 3, 6 (2019). https://github.com/PyTorchLightning/pytorch-lightning
Gählert, N., Jourdan, N., Cordts, M., Franke, U., Denzler, J.: Cityscapes 3d: dataset and benchmark for 9 DoF vehicle detection. ar**v preprint ar**v:2006.07864 (2020)
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2016)
Google Scholar
Hoyer, L., Dai, D., Van Gool, L.: DAFormer: improving network architectures and training strategies for domain-adaptive semantic segmentation. ar**v preprint ar**v:2111.14887 (2021)
Huang, X., et al.: The apolloscape dataset for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 954–960 (2018)
Google Scholar
Trimble Inc: https://3dwarehouse.sketchup.com/
Josifovski, J., Kerzel, M., Pregizer, C., Posniak, L., Wermter, S.: Object detection and pose estimation based on convolutional neural networks trained with synthetic data. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6269–6276. IEEE (2018)
Google Scholar
Kishore, A., Choe, T.E., Kwon, J., Park, M., Hao, P., Mittel, A.: Synthetic data generation using imitation training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3078–3086 (2021)
Google Scholar
Li, Z., et al.: OpenRooms: an end-to-end open framework for photorealistic indoor scene datasets. ar**v preprint ar**v:2007.12868 (2020)
Liao, Y., **e, J., Geiger, A.: Kitti-360: a novel dataset and benchmarks for urban scene understanding in 2d and 3d. ar**v preprint ar**v:2109.13410 (2021)
Luo, Y., Cai, P., Hsu, D., Lee, W.S.: GAMMA: a general agent motion prediction model for autonomous driving. ar**v preprint ar**v:1906.01566 (2019)
Mayer, N.: What makes good synthetic training data for learning disparity and optical flow estimation? Int. J. Comput. Vision 126(9), 942–960 (2018)
Article Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Nowruzi, F.E., Kapoor, P., Kolhatkar, D., Hassanat, F.A., Laganiere, R., Rebut, J.: How much real data do we actually need: analyzing object detection performance using synthetic and real data. ar**v preprint ar**v:1907.07061 (2019)
OpenStreetMap contributors: planet dump retrieved from https://planet.osm.org (2017). https://www.openstreetmap.org
Prakash, A., et al.: Self-supervised real-to-sim scene generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16044–16054 (2021)
Google Scholar
Prochitecture: Blender-OSM: OpenStreetMap and terrain for blender (2021). https://github.com/vvoovv/blender-osm
Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graphics Appl. 21(5), 34–41 (2001)
Article Google Scholar
Roberts, M., et al.: Hypersim: a photorealistic synthetic dataset for holistic indoor scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10912–10922 (2021)
Google Scholar
Rong, G., et al.: LGSVL simulator: a high fidelity simulator for autonomous driving. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–6. IEEE (2020)
Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3234–3243 (2016)
Google Scholar
Ruiz, N., Schulter, S., Chandraker, M.: Learning to simulate. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HJgkx2Aqt7
Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics. SPAR, vol. 5, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67361-5_40
Chapter Google Scholar
Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7472–7481 (2018)
Google Scholar
Varma, G., Subramanian, A., Namboodiri, A., Chandraker, M., Jawahar, C.: IDD: a dataset for exploring problems of autonomous navigation in unconstrained environments. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1743–1751 (2019). https://doi.org/10.1109/WACV.2019.00190
Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
Google Scholar
Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Wang, J., et al.: AdvSim: generating safety-critical scenarios for self-driving vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9909–9918 (2021)
Google Scholar
Weber, M., et al.: Step: segmenting and tracking every pixel. In: Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks (2021)
Google Scholar
Wrenninge, M., Unger, J.: Synscapes: a photorealistic synthetic dataset for street scene parsing. ar**v preprint ar**v:1810.08705 (2018)
Wulff, J., Butler, D.J., Stanley, G.B., Black, M.J.: Lessons and insights from creating a synthetic optical flow benchmark. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 168–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33868-7_17
Chapter Google Scholar
Zheng, G., Liu, H., Xu, K., Li, Z.: Learning to simulate vehicle trajectories from demonstrations. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1822–1825. IEEE (2020)
Google Scholar

Download references

Acknowledgements

This work is funded by iHub-data and mobility at IIIT Hyderabad.

Author information

Authors and Affiliations

IIIT Hyderabad, Hyderabad, Telangana, India
Shubham Dokania, Anbumani Subramanian & C. V. Jawahar
University of California San Diego, San Diego, CA, USA
Manmohan Chandraker

Authors

Shubham Dokania
View author publications
You can also search for this author in PubMed Google Scholar
Anbumani Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Manmohan Chandraker
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shubham Dokania .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18081 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dokania, S., Subramanian, A., Chandraker, M., Jawahar, C.V. (2022). TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-20074-8_34
Published: 12 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20073-1
Online ISBN: 978-3-031-20074-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reliving the Dataset: Combining the Visualization of Road Users’ Interactions with Scenario Reconstruction in Virtual Reality

Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes

Does it work outside this benchmark? Introducing the rigid depth constructor tool

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 18081 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reliving the Dataset: Combining the Visualization of Road Users’ Interactions with Scenario Reconstruction in Virtual Reality

Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes

Does it work outside this benchmark? Introducing the rigid depth constructor tool

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 18081 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation