Abstract
High-quality structured data with rich annotations are critical components in intelligent vehicle systems dealing with road scenes. However, data curation and annotation require intensive investments and yield low-diversity scenarios. The recently growing interest in synthetic data raises questions about the scope of improvement in such systems and the amount of manual work still required to produce high volumes and variations of simulated data. This work proposes a synthetic data generation pipeline that utilizes existing datasets, like nuScenes, to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation, mimicking real scene properties with high-fidelity, along with mechanisms to diversify samples in a physically meaningful way. We demonstrate improvements in mIoU metrics by presenting qualitative and quantitative experiments with real and synthetic data for semantic segmentation on the Cityscapes and KITTI-STEP datasets. All relevant code and data is released on github\(^{3}\) (https://github.com/shubham1810/trove_toolkit).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baidu Apollo team: Apollo: Open Source Autonomous Driving (2017). https://github.com/apolloauto/apollo. Accessed 11 Feb 2022
Alhaija, H., Mustikovela, S., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int. J. Comput. Vis. (IJCV) 126, 961–972 (2018)
Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2810 (2018)
Behley, J., et al.: Semantickitti: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)
Cabon, Y., Murray, N., Humenberger, M.: Virtual kitti 2. ar**v preprint ar**v:2001.10773 (2020)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Cai, P., Lee, Y., Luo, Y., Hsu, D.: SUMMIT: a simulator for urban driving in massive mixed traffic. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4023–4029 (2020). https://doi.org/10.1109/ICRA40945.2020.9197228
CGTrader: 3d model store. https://www.cgtrader.com/
Chang, A.X., et al.: ShapeNet: an information-rich 3d model repository. ar**v preprint ar**v:1512.03012 (2015)
Chang, M.F., et al.: Argoverse: 3d tracking and forecasting with rich maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8748–8757 (2019)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Chen, W., et al.: Contrastive syn-to-real generalization. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=F8whUO8HNbP
Blender Online Community: Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam (2018). http://www.blender.org
PolyHaven Community: 3d model and texture store. https://polyhaven.com/
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Cordts, M., et al.: The cityscapes dataset. In: CVPR Workshop on The Future of Datasets in Vision (2015)
Denninger, M., et al.: Blenderproc. ar**v preprint ar**v:1911.01911 (2019)
Devaranjan, J., Kar, A., Fidler, S.: Meta-Sim2: unsupervised learning of scene structure for synthetic data generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 715–733. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_42
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16. PMLR (2017)
Falcon, W., et al.: PyTorch lightning. GitHub 3, 6 (2019). https://github.com/PyTorchLightning/pytorch-lightning
Gählert, N., Jourdan, N., Cordts, M., Franke, U., Denzler, J.: Cityscapes 3d: dataset and benchmark for 9 DoF vehicle detection. ar**v preprint ar**v:2006.07864 (2020)
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2016)
Hoyer, L., Dai, D., Van Gool, L.: DAFormer: improving network architectures and training strategies for domain-adaptive semantic segmentation. ar**v preprint ar**v:2111.14887 (2021)
Huang, X., et al.: The apolloscape dataset for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 954–960 (2018)
Trimble Inc: https://3dwarehouse.sketchup.com/
Josifovski, J., Kerzel, M., Pregizer, C., Posniak, L., Wermter, S.: Object detection and pose estimation based on convolutional neural networks trained with synthetic data. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6269–6276. IEEE (2018)
Kishore, A., Choe, T.E., Kwon, J., Park, M., Hao, P., Mittel, A.: Synthetic data generation using imitation training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3078–3086 (2021)
Li, Z., et al.: OpenRooms: an end-to-end open framework for photorealistic indoor scene datasets. ar**v preprint ar**v:2007.12868 (2020)
Liao, Y., **e, J., Geiger, A.: Kitti-360: a novel dataset and benchmarks for urban scene understanding in 2d and 3d. ar**v preprint ar**v:2109.13410 (2021)
Luo, Y., Cai, P., Hsu, D., Lee, W.S.: GAMMA: a general agent motion prediction model for autonomous driving. ar**v preprint ar**v:1906.01566 (2019)
Mayer, N.: What makes good synthetic training data for learning disparity and optical flow estimation? Int. J. Comput. Vision 126(9), 942–960 (2018)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Nowruzi, F.E., Kapoor, P., Kolhatkar, D., Hassanat, F.A., Laganiere, R., Rebut, J.: How much real data do we actually need: analyzing object detection performance using synthetic and real data. ar**v preprint ar**v:1907.07061 (2019)
OpenStreetMap contributors: planet dump retrieved from https://planet.osm.org (2017). https://www.openstreetmap.org
Prakash, A., et al.: Self-supervised real-to-sim scene generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16044–16054 (2021)
Prochitecture: Blender-OSM: OpenStreetMap and terrain for blender (2021). https://github.com/vvoovv/blender-osm
Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graphics Appl. 21(5), 34–41 (2001)
Roberts, M., et al.: Hypersim: a photorealistic synthetic dataset for holistic indoor scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10912–10922 (2021)
Rong, G., et al.: LGSVL simulator: a high fidelity simulator for autonomous driving. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–6. IEEE (2020)
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3234–3243 (2016)
Ruiz, N., Schulter, S., Chandraker, M.: Learning to simulate. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HJgkx2Aqt7
Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics. SPAR, vol. 5, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67361-5_40
Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7472–7481 (2018)
Varma, G., Subramanian, A., Namboodiri, A., Chandraker, M., Jawahar, C.: IDD: a dataset for exploring problems of autonomous navigation in unconstrained environments. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1743–1751 (2019). https://doi.org/10.1109/WACV.2019.00190
Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Wang, J., et al.: AdvSim: generating safety-critical scenarios for self-driving vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9909–9918 (2021)
Weber, M., et al.: Step: segmenting and tracking every pixel. In: Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks (2021)
Wrenninge, M., Unger, J.: Synscapes: a photorealistic synthetic dataset for street scene parsing. ar**v preprint ar**v:1810.08705 (2018)
Wulff, J., Butler, D.J., Stanley, G.B., Black, M.J.: Lessons and insights from creating a synthetic optical flow benchmark. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 168–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33868-7_17
Zheng, G., Liu, H., Xu, K., Li, Z.: Learning to simulate vehicle trajectories from demonstrations. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1822–1825. IEEE (2020)
Acknowledgements
This work is funded by iHub-data and mobility at IIIT Hyderabad.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dokania, S., Subramanian, A., Chandraker, M., Jawahar, C.V. (2022). TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-20074-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20073-1
Online ISBN: 978-3-031-20074-8
eBook Packages: Computer ScienceComputer Science (R0)