Bird’s-Eye View Semantic Segmentation and Voxel Semantic Segmentation Based on Frustum Voxel Modeling and Monocular Camera

Qin, Chao; Wang, Yafei; Zhang, Yuchao; Yin, Chengliang

doi:10.1007/s12204-023-2573-3

Bird’s-Eye View Semantic Segmentation and Voxel Semantic Segmentation Based on Frustum Voxel Modeling and Monocular Camera

基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割

Original Paper
Published: 07 February 2023

Volume 28, pages 100–113, (2023)
Cite this article

Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Chao Qin (秦超)¹,
Yafei Wang (王亚飞)¹,
Yuchao Zhang (张宇超)² &
…
Chengliang Yin (殷承良)¹

116 Accesses
Explore all metrics

Abstract

The semantic segmentation of a bird’s-eye view (BEV) is crucial for environment perception in autonomous driving, which includes the static elements of the scene, such as drivable areas, and dynamic elements such as cars. This paper proposes an end-to-end deep learning architecture based on 3D convolution to predict the semantic segmentation of a BEV, as well as voxel semantic segmentation, from monocular images. The voxelization of scenes and feature transformation from the perspective space to camera space are the key approaches of this model to boost the prediction accuracy. The effectiveness of the proposed method was demonstrated by training and evaluating the model on the NuScenes dataset. A comparison with other state-of-the-art methods showed that the proposed approach outperformed other approaches in the semantic segmentation of a BEV. It also implements voxel semantic segmentation, which cannot be achieved by the state-of-the-art methods.

摘要

自动驾驶场景中包含静态目标，如可驾驶区域，以及动态目标，如汽车, 而鸟瞰图的语义分割对于自主驾驶中的环境感知至关重要。本文提出了一个基于三维卷积的端到端深度学**模型以单目相机作为输入并预测鸟瞰图的语义分割和体素语义分割。场景的体素化建模和透视空间到相机空间的特征转换是提高本模型预测准确性的的关键方法。本模型在NuScenes数据集上进行训练并评估该方法的有效性。与其他经典模型的对比结果表明本文提出的模型在鸟瞰图的语义分割方面优于其他算法。此外本文模型还实现了体素语义分割，而其他模型并不具备体素语义分割的能力。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale 3D Semantic Map** Using Stereo Vision

Article 09 March 2018

Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving

GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation

References

BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495.
Article Google Scholar
READING C, HARAKEH A, CHAE J L, et al. Categorical depth distribution network for monocular 3D object detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8551–8560.
Google Scholar
ABBAS S A, ZISSERMAN A. A geometric approach to obtain a bird’s eye view from an image [C]//2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 4095–4104.
Google Scholar
LIN C C, WANG M S. A vision based top-view transformation model for a vehicle parking assistant [J]. Sensors, 2012, 12(4): 4431–4446.
Article Google Scholar
DENG L Y, YANG M, LI H, et al. Restricted deformable convolution-based road scene semantic segmentation using surround view cameras [J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4350–4362.
Article Google Scholar
SÄMANN T, AMENDE K, MILZ S, et al. Efficient semantic segmentation for visual bird’s-eye view interpretation [M]//Intelligent autonomous systems 15. Cham: Springer, 2018: 679–688.
Google Scholar
PAN B W, SUN J K, LEUNG H Y T, et al. Cross-view semantic segmentation for sensing surroundings [J]. IEEE Robotics and Automation Letters, 2020, 5(3): 4867–4873.
Article Google Scholar
LU C Y, VAN DE MOLENGRAFT M J G, DUBBELMAN G. Monocular semantic occupancy grid map** with convolutional variational encoderdecoder networks [J]. IEEE Robotics and Automation Letters, 2019, 4(2): 445–452.
Article Google Scholar
SCHULTER S, ZHAI M H, JACOBS N, et al. Learning to look around objects for top-view representations of outdoor scenes [M]//Computer vision — ECCV 2018. Cham: Springer, 2018: 815–831.
Chapter Google Scholar
MANI K, DAGA S, GARG S, et al. MonoLayout: Amodal scene layout from a single image [C]//2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass: IEEE, 2020: 1678–1686.
Google Scholar
RODDICK T, CIPOLLA R. Predicting semantic map representations from images using pyramid occupancy networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11135–11144.
Google Scholar
RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation [M]//Medical image computing and computerassisted intervention — MICCAI 2015. Cham: Springer, 2015: 234–241.
Chapter Google Scholar
DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13728–13737.
Google Scholar
LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999–3007.
Google Scholar
CAESAR H, BANKITI V, LANG A H, et al. nuScenes: A multimodal dataset for autonomous driving [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11618–11628.
Google Scholar
KINGMA D P, BA J. Adam: A method for stochastic optimization[DB/OL]. (2017-01-30). https://arxiv.org/abs/1412.6980.
GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation [DB/OL]. (2017-04-22). https://arxiv.org/abs/1704.06857.

Download references

Funding

Foundation item: the National Natural Science Foundation of China (No. 52072243), and the Sichuan Science and Technology Program (No. 2020YFSY0058)

Author information

Authors and Affiliations

School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Chao Qin (秦超), Yafei Wang (王亚飞) & Chengliang Yin (殷承良)
Shanghai Intelligent and Connected Vehicle R&D Center Co., Ltd., Shanghai, 201499, China
Yuchao Zhang (张宇超)

Authors

Chao Qin (秦超)
View author publications
You can also search for this author in PubMed Google Scholar
Yafei Wang (王亚飞)
View author publications
You can also search for this author in PubMed Google Scholar
Yuchao Zhang (张宇超)
View author publications
You can also search for this author in PubMed Google Scholar
Chengliang Yin (殷承良)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengliang Yin (殷承良).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, C., Wang, Y., Zhang, Y. et al. Bird’s-Eye View Semantic Segmentation and Voxel Semantic Segmentation Based on Frustum Voxel Modeling and Monocular Camera. J. Shanghai Jiaotong Univ. (Sci.) 28, 100–113 (2023). https://doi.org/10.1007/s12204-023-2573-3

Download citation

Received: 08 March 2022
Accepted: 25 April 2022
Published: 07 February 2023
Issue Date: February 2023
DOI: https://doi.org/10.1007/s12204-023-2573-3

Keywords

关键词

CLC number

TP 391.4

Document code

A

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bird’s-Eye View Semantic Segmentation and Voxel Semantic Segmentation Based on Frustum Voxel Modeling and Monocular Camera

Abstract

摘要

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-scale 3D Semantic Map** Using Stereo Vision

Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving

GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

关键词

CLC number

Document code

Subscribe and save

Buy Now

Navigation

Bird’s-Eye View Semantic Segmentation and Voxel Semantic Segmentation Based on Frustum Voxel Modeling and Monocular Camera

Abstract

摘要

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-scale 3D Semantic Map** Using Stereo Vision

Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving

GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

关键词

CLC number

Document code

Subscribe and save

Buy Now

Search

Navigation