Research on intelligent vehicle lamp signal recognition in traffic scene

Shi, Peicheng; Qi, Heng; Liu, Zhiqiang; Yang, Aixi

doi:10.1007/s42452-022-05211-9

Research on intelligent vehicle lamp signal recognition in traffic scene

Research Article
Open access
Published: 12 November 2022

Volume 4, article number 327, (2022)
Cite this article

Download PDF

You have full access to this open access article

SN Applied Sciences Aims and scope Submit manuscript

Research on intelligent vehicle lamp signal recognition in traffic scene

Download PDF

Peicheng Shi ORCID: orcid.org/0000-0003-1533-8154¹,
Heng Qi¹,
Zhiqiang Liu¹ &
…
Aixi Yang²

2188 Accesses
Explore all metrics

Abstract

In order to improve the ability of intelligent vehicle to accurately recognize the semantics of the front vehicle, a new method of lamp signal recognition is proposed by combining deep learning with traditional computer vision. Firstly, the method uses YOLOv4 (You Only Look Once) network to detect vehicles and obtain accurate vehicle tail areas; Then, according to the spatial distribution characteristics of the pixels lighting the lamp in hue, saturation, value (HSV), a HSV spatial segmentation method based on region adaptive threshold is proposed to improve the pixel extraction quality; Finally, the deep neural network model is established to train the collected sample data, classify the brake lamp, turn signal lamp and lamp off state according to the information of lighting lamp pixels, and infer the current lamp meaning of the vehicle in front. In this paper, Python3.8, Pytorch1.9 and Opencv3.2 are used as algorithm implementation tools to test the road in the daytime urban traffic scene. The experimental results show that the average accuracy of the algorithm is 81.3%, the accuracy of the left turn signal is 75.8%, and the accuracy of the right turn signal is 76.4%.

Article Highlights

This paper investigates vehicle lamp signal recognition that is rarely explored in intelligent vehicle perception. A convolutional neural network is trained on the autonomous driving dataset for detection of vehicle tail regions.
This paper transforms the image color space and proposes a region-based adaptive threshold for semantic segmentation of vehicle taillights with different hue, saturation and value.
This paper constructs a deep neural network for lamp signal classification. Combined with the data collected in the actual traffic scene and the optimized network structure, the judgment of turn signals and brake signals is completed.

Traffic Light and Vehicle Signal Recognition with High Dynamic Range Imaging and Deep Learning

Automatic Traffic Light Detection for Self-Driving Cars Using Transfer Learning

A novel OYOLOV5 model for vehicle detection and classification in adverse weather conditions

Article 14 August 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Intelligent vehicle can be viewed as an intelligent body that integrates the perception layer, the decision-making layer and the execution layer. Perception technology is an important means for the intelligent vehicle to acquire surrounding information. In a traffic scene, a moving vehicle uses a series of lamp signals in line with traffic rules to remind or hint the driver. Therefore, whether an intelligent vehicle can understand the intention of the lamp signal through the perception module, and perform behavior prediction or intervention in advance is crucial to safety improvement.

Recently, with the development of deep learning, target detection technology has become increasingly mature. In particular, vehicle detection has been commercialized and computer vision allows inferences based on the salient features of the vehicle lamp color. Combination of the two will definitely promote the development of intelligent vehicle perception technology. Li et al. [1] combines Haar feature with Adaboost cascade classifier to detect the vehicle position, then performs tail lamp segmentation in the red, green, blue (RGB) space, and completes lamp signal information recognition based on the lamp signal recognition rules. Nevertheless, the Adaboost cascade classifier in this algorithm is prone to missed and false detection of vehicle positions, which decreases accuracy in lamp signal recognition. Cui et al. [2] uses clustering techniques to extract taillight candidate signals and estimate tail lamp state. By combining the state history information, it infers the current lamp signal meaning of the vehicle ahead. However, the algorithm depends on the picture texture information in vehicle lamp positioning and recognition, with poor robustness under complex working conditions. Chen et al. [3] proposes to use the scattering modeling and reflection direction to segment the vehicle lamp area and complete the tail lamp recognition, but it fails to judge the lamp signal according to the tail lamp information. Fröhlich et al. [4] first detects the light spot of the image, selects the vehicle lamp area, and then performs feature extraction according to feature transformation based time signal behavior analysis of the light spot. Finally, it classifies the extracted features through Adaboost to recognize the lamp signal intention. However, Adaboost classifier trained based on the tail lamp frequency information in this algorithm can recognize few lamps signal types. He et al. [5] builds a deep neural network, adds an attention mechanism to optimize the network, and thoroughly learns the tail lamp details of vehicles. Nonetheless, the algorithm convergence rate is not fully guaranteed, and the recognition accuracy has insufficient generalization. Based on the YOLOv3-tiny backbone network, Yoneda et al. [6] uses a convolutional neural network to detect lamp states and then calculates the flicker frequency using a fast Fourier transform, which has good detection effect for small-target vehicle lamp. However, due to the big convolutional neural network scale, over-fitting is easy if data set is insufficient.

In summary, the traditional computer vision or machine learning algorithms used by the predecessors have certain deficiencies in recognizing the semantics of tail lamps, while lamp signal recognition simply based on deep learning relies on a large number of training samples and has high complexity. To this end, this paper proposes a new lamp signal recognition algorithm that combines deep learning with computer vision. The paper is structured as follows: It first uses YOLOv4 [7] in chapter 2 to detect the vehicle tail and sets the potential area of the vehicle tail lamp. On this basis, a region-based adaptive threshold is proposed in chapter 3 to segment the lit tail lamp in the hue, saturation, value (HSV) space. In chapter 4, the deep neural network (DNN) model is constructed to train and collect samples, and the lamp signal is classified and predicted according to the pixel information in the HSV space. Finally, in chapter 5, the performance of the proposed algorithm is evaluated with real vehicle experiments and the summary and discussion in chapter 6. The algorithm proposed in this paper determines the current meaning of the lamp signal of the vehicle ahead, which helps intelligent vehicles understand lamp signals, improves the lamp signal recognition accuracy and lays a certain foundation for driving of intelligent vehicles in complex real traffic scenes.

2 Vehicle tail detection

In lamp signal recognition, it is crucial whether the vehicle tail can be accurately and quickly recognized and detected through the image information captured by the camera [8]. In this paper, a YOLOv4-based target detection model is established and trained on the KITTI [9] dataset to quickly detect the vehicle tail.

The KITTI dataset was co-founded by Karlsruhe Institute of Technology in Germany and Toyota American Institute of Technology. It is currently the largest computer vision dataset in the world for autonomous driving scenarios. KITTI contains real image data collected from urban, rural, and highway scenes, with up to 15 vehicles per image, meeting our YOLOv4 criteria for training and testing 2D object detection.

2.1 YOLOv4 network structure

As shown in Fig. 1, the YOLOv4 network is mainly composed of four parts. First, the CSPDarknet53 backbone feature extraction network is used to down-sample the input image and stack the residual structure. Then, maximum pooling of three different sizes is used to process the feature pyramid output from the upper layer. The pooling core size is set to $13\times 13$, $9\times 9$ and $5\times 5$ respectively. Then, path aggregation network (PANet) is used for bottom-to-up feature extraction of the three effective feature layers: up-sampling and stacking for the upper-level features, and down-sampling and stacking for the lower-level features. Finally, a three-layer feature pyramid with strong semantics is obtained, which is used to detect the target's category and return the target's size.

2.2 Vehicle tail detection based on YOLOv4

This paper uses KITTI dataset for network training. This data set is currently the world’s largest computer vision algorithm evaluation data set under autonomous driving scene [9]. This paper selects 7,024 pictures to label the vehicle tail and uses it as test set, validation set and training set to build YOLOv4 model according to training network parameters. As shown in Fig. 2, the MV-SUA134GC monocular industrial camera from MindVision is used for real vehicle picture acquisition experiments. Figure 3 shows the detection of road vehicles.

3 Vehicle tail lamp detection

Vehicle tail lamp detection includes position and state determination of the tail lamp. First, the ROI (Region of Interest) area of vehicle lamp information is obtained according to the structured characteristics of the vehicle appearance, and then the lamp images in different states are segmented by computer vision.

3.1 Determination of the vehicle tail lamp position

With reference to the National Standard GB 4785-2019 of the People’s Republic of China "Installation Regulations for External Lighting and Lamp Signaling Devices of Automobiles and Trailers" [10], as shown in Fig. 4(a), according to appearance and color properties, the vehicle tail lamp area usually includes three areas A, B, C. Each area represents different lamp color and semantics. At the same time, there are certain requirements for the installation location of the tail lamp. As shown in Fig. 4(b), with the reversing lamp as an example, the height of the tail lamp top from the ground D < 1200 mm, the height of the tail lamp bottom from the ground E > 250 mm, and the distance between the tail lamps on both sides F > 600 mm. According to the above information and vehicle dimension, the ROI area including the vehicle tail lamp can be calculated.

3.2 Hue, saturation, value space-based vehicle lamp detection

Hue, saturation, value (HSV) [11] space is a color space that describes color characteristics. Compared with RGB space, HSV space can better characterize human visual perception. Usually, the image collected by the optical sensor is in RGB format. Specifically, this paper first normalizes the R, G, and B channels of the pixel, and then calculates the maximum and minimum channel values ${C}_{max}$ and ${C}_{min}$ according to formulas (1) and (2). After obtaining the channel difference $\Delta $ (formula (3)), the corresponding $\mathrm{H}$, $\mathrm{S}$ and $\mathrm{V}$ channel values of pixels can be calculated according to the values of $\Delta $ and ${C}_{max}$ (formulas (4), (5) and (6)). Through the above operations, the color space conversion is completed.

$${C}_{max}=\mathrm{max}\left(\frac{R}{255},\frac{G}{255},\frac{B}{255}\right)$$

(1)

$${C}_{min}=\mathrm{min}\left(\frac{R}{255},\frac{G}{255},\frac{B}{255}\right)$$

(2)

$$\Delta ={C}_{max}-{C}_{min}$$

(3)

$$\mathrm{H}=\left\{\begin{array}{c}{0}^{^\circ } , \Delta =0\\ {60}^{^\circ }\times \left(\frac{G-B}{255\times \Delta }+0\right) ,{C}_{max}=\frac{R}{255}\\ {60}^{^\circ }\times \left(\frac{B-R}{255\times \Delta }+2\right) ,{C}_{max}=\frac{G}{255}\\ {60}^{^\circ }\times \left(\frac{G-B}{255\times \Delta }+0\right) ,{C}_{max}=\frac{B}{255}\end{array}\right.$$

(4)

$$\mathrm{S}=\left\{\begin{array}{c}0 ,{C}_{max}=0\\ \frac{\Delta }{{C}_{max}} ,{C}_{max}\ne 0\end{array}\right.$$

(5)

$$\mathrm{V}={C}_{max}$$

(6)

In the above formulas, respectively represent the pixel values in red, green, and blue channels of the RGB space, and represent the hue, saturation, and brightness value of the pixel in the HSV space.

In vehicle lamp recognition, the key lies in the difference between the on and off states of the vehicle lamp in the HSV space. As shown in Fig. 5, the HSV space conversion is performed on the ROI area of the vehicle tail lamp in the two states to generate a three-dimensional point cloud image. The point cloud image represents the distribution of hue, saturation and brightness of each pixel in the image.

By comparing Fig. 5a and b, it can be seen that when the vehicle lamp is on, the Value component, that is, the brightness component, changes significantly compared to the off state; the hue component of the image presents bimodal characteristic. The division of the color thresholds in the HSV space is detailed in Table 1 [14]. It can be seen that the three RGB colors of yellow, orange, and red in the tail lamp have different ranges and large spans in the hue component, while saturation component has similar range as the value component.

Table 1 HSV space color data

Full size table

Therefore, according to the above characteristics, a specific threshold can be set to segment the pixels in the lighting state [12] [13]. In this paper, the upper cut-off threshold is set as $\left\{{\widehat{H}}_{max}=180 {\widehat{S}}_{max}=255 {\widehat{V}}_{max}=255\right\}$. ${\widehat{H}}_{min}=0$, ${\widehat{S}}_{min}=43$ is set in the lower cut-off threshold, and the ${\widehat{V}}_{min}$ choice affects the segmentation effect. As shown in Fig. 6, segmentation results are selected when ${\widehat{V}}_{min}$ is 100, 150, 200, and 250 respectively. Comparison with the original image reveals that: if ${\widehat{V}}_{min}$ is too small, the lighting area cannot be distinguished, and if ${\widehat{V}}_{min}$ is too large, the image details will be lost, so selection of a reasonable segmentation threshold is crucial. Since the selection of a fixed threshold cannot meet the segmentation accuracy requirement and is time-consuming and labor-intensive, this paper proposes a region-based adaptive threshold segmentation algorithm.

3.3 Region-based adaptive threshold segmentation algorithm

According to the color characteristics of the vehicle tail lamp and the brightness characteristics in the lighting state, based on the distribution of each pixel in the HSV space, this paper sets three areas to represent the red area, yellow (orange) area and other areas. The red area is set as (${H}_{max}^{R}\sim {H}_{min}^{R}$, ${S}_{max}^{R}\sim {S}_{min}^{R}$, ${V}_{max}^{R}\sim {V}_{min}^{R}$), the yellow (orange) area is set as (${H}_{max}^{Y}\sim {H}_{min}^{Y}$, ${S}_{max}^{Y}\sim {S}_{min}^{Y}$, ${V}_{max}^{Y}\sim {V}_{min}^{Y}$), and the other areas are set as (${H}_{max}^{E}\sim {H}_{min}^{E}$, ${S}_{max}^{E}\sim {S}_{min}^{E}$, ${V}_{max}^{E}\sim {V}_{min}^{E}$). Where, ${H}_{max}\sim {H}_{min}$ limits the color within hue channel, ${\mathrm{S}}_{max}\sim {S}_{min}$ limits the color within saturation channel, and ${V}_{max}\sim {V}_{min}$ limits the color within brightness channel. The selection of ${V}_{min}$ is related to ${V}_{max}$, whose relational formula is: ${V}_{min}={V}_{max}-a$. The value ${V}_{max}$ is the maximum $\mathrm{V}$ channel value of the pixel points when the above three regions are within the $\mathrm{H}$ and $\mathrm{S}$ channels. This paper sets a constant $a=15$ to limit the dimension of each region in the $\mathrm{V}$ channel.

According to the area established above, adaptive selection of the $\mathrm{V}$ channel threshold ${\widehat{V}}_{min}$ is possible, with the basic steps shown as follows:

$${\widehat{V}}_{min}=\left\{\begin{array}{c}255 arg{V}^{Y}<arg{V}^{Y}+c,arg{V}^{Y}<arg{V}^{Y}+c\\ {V}_{max}^{R} arg{V}^{Y}>arg{V}^{R},arg{V}^{R}<arg{V}^{E}+c\\ {V}_{max}^{Y} arg{V}^{R}>arg{V}^{Y},arg{V}^{Y}<arg{V}^{E}+c,\\ {V}_{max}^{E} arg{V}^{R}>arg{V}^{E}+c,arg{V}^{Y}>arg{V}^{E}+c\end{array}\right.$$

(7)

(1)
Calculate the average $\mathrm{V}$ channel value in the three regions, and calculate the average $\mathrm{V}$ channel value $\mathrm{arg}{V}^{R}$ in the red area, the average $\mathrm{V}$ channel value $\mathrm{arg}{V}^{Y}$ in the yellow (orange) area, and the average $\mathrm{V}$ channel value $\mathrm{arg}{V}^{E}$ in the other areas.
(2)
Lamp off state: according to formula (7), this paper sets the pixel perturbation constant $c=10$ based on a large number of experiments to reduce the impact of noise pixels on the segmented pixels. If both the average $\mathrm{V}$ channel value in the red area and the average $\mathrm{V}$ channel value in the yellow (orange) area are smaller than the sum of the average $\mathrm{V}$ channel value in the other areas and the constant $c$, we can set the threshold ${\widehat{V}}_{min}={\widehat{V}}_{max}=255$ at this time. No pixel points are segmented in the image, but are converted into a binary image. The value of each pixel point is 0, that is, black.
(3)
Lamp lighting state: the lamp lighting state has three states: yellow (orange) light is on, red light is on and the two color lights are on together. According to formula (7), if the average $\mathrm{V}$ channel value in the yellow (orange) area is greater than the average $\mathrm{V}$ channel value in the red area, and the average $\mathrm{V}$ channel value in the red area is smaller than the sum of the average $\mathrm{V}$ channel value in the other areas and the constant $c$, then the threshold is set ${\widehat{V}}_{min}={V}_{max}^{R}$ at this time, and ${V}_{max}^{R}$ is the maximum $\mathrm{V}$ channel value in the red area. From the image, high-brightness yellow (orange) pixels can be segmented and converted into a binary image. The value of the segmented pixel is 1, which means white is the foreground, and the value of other pixels is 0, which means black is the background. The yellow (orange) lamp is on at this time. In the same way, the red pixels with high brightness in the image are segmented, and the red lamp is on at this time. When the average $\mathrm{V}$ channel value in the red area and the yellow (orange) area is smaller than the sum of the average V channel value in the other areas and the constant$c$, set the threshold ${\widehat{V}}_{min}={V}_{max}^{E}$ at this time. ${V}_{max}^{E}$ is the maximum $\mathrm{V}$ channel value in the other area. At this time, red and yellow (orange) pixels with high brightness can be segmented, and it is judged that the red lamp and yellow (orange) lamp are on at the same time.
(4)
Take the divided binary image as a mask and perform an "AND" operation with the original image. The white pixels correspond to the reserved pixels in the original image, and the black pixels correspond to the pixels deleted in the original image.

As shown in Fig. 7, this algorithm is used to detect vehicle lamps in different states. It can be seen that according to the adaptive threshold of the $\mathrm{V}$ channel, vehicle lamp lighting pixels of different colors can be well segmented.

4 Semantic recognition of vehicle tail lamp based on deep neural network

The driver usually predicts the vehicle behavior based on the tail lamp state of the vehicle ahead. Therefore, it is necessary for intelligent vehicle to learn the driver's thinking mode and perform semantic recognition of tail lamp. The lamp signal semantics of the lit vehicle tail lamp are usually described using tail lamp color, single/double side and state, as shown in Table 2 [15].

Table 2 Common manifestations of lamp signal

Full size table

According to the different expressions of the lamp signal shown in Table 2, the tail lamp images of the vehicles are collected as the training, verification and testing samples of deep neural network (DNN) [16]. According to the HSV space color data, the scope of the yellow (orange) pixels of the left tail lamp in the H channel is defined as $\left(11-34\right)$ and the average value of H channel pixels within the scope is calculated as ${X}_{1}$. The red pixel of the left tail lamp within the $\mathrm{H}$ channel is defined as $\left(0-10\right)\cup \left(156-180\right)$ and the average value of $\mathrm{H}$ channel pixels within the scope is calculated as ${X}_{2}$. Similarly, the average value ${X}_{3}$ of yellow (orange) pixels of the right tail lamp in H channel and average value ${X}_{4}$ of red pixels in H channel are calculated. $X=\left({X}_{1},{X}_{2},{X}_{3},{X}_{4}\right)$ are recorded as input features. According to the lighting pixels in the image and the lamp signal expression form, record the name of the lamp signal at this time, define the left turn signal lamp as ${y}_{1}$, the right turn signal lamp as ${y}_{2}$, the brake lamp as ${y}_{3}$. If brake lamp and the left turn signal lamp are on at the same time, it is recorded as ${y}_{4}$, if the brake lamp and the right turn signal lamp are on at the same time, it is recorded as ${y}_{5}$, and if the lamps on both sides are not on, it is recorded as ${y}_{6}$. $Y=\left({y}_{1},{y}_{2},{y}_{3},{y}_{4},{y}_{5},{y}_{6}\right)$ is recorded as the true value.

Figure 8 shows the 3-layer DNN structure used in this paper, which is composed of an input layer, a hidden layer, and an output layer. Where, $n=100$ represents the number of neurons in each hidden layer, the input layer inputs 4 features, and the output layer outputs 6 categories. The layers are fully connected, and the deep neurons in DNN can be applied to solve multi-classification problems. For instance, in this paper, the classification of lamp signals is judged by pixel features.

The DNN training process includes signal forward propagation and error back propagation [17]. In the signal forward propagation, the DNN linear operation process is shown in formula (8), (9) and (10). Where, ${A}^{k}=\left({a}_{1}^{k} {a}_{2}^{k} {a}_{3}^{k} \cdots {a}_{100}^{k}\right)$ is the result of the k-th layer linear operation, $\mathrm{X}=\left({X}_{1} {X}_{2} {X}_{3} {X}_{4}\right)$ represents the input signal, ${W}^{k}$ is the weight of the k-th layer, and ${B}^{k}$ is the bias of the k-th layer.

$${A}^{k}=X{W}^{k}+{B}^{k} k=\mathrm{1,2}$$

(8)

$${W}^{k}={\left({w}_{1n}^{k} {w}_{2n}^{k} {w}_{3n}^{k} {w}_{4n}^{k}\right)}^{T}$$

(9)

$${B}^{k}={\left({b}_{1}^{k} {b}_{2}^{k} {b}_{3}^{k} \cdots {b}_{n}^{k}\right)}^{T} n=100$$

(10)

As shown in formula (11), the activation function is used to transfer the linear operation result. The activation function between the layers in this paper is the sigmoid function. Where, ${Z}^{k}$ is the output of the k-th layer after activation, and the sigmoid function is shown in formula (12). Where, ${a}_{j}$ represents the input value of the $\mathrm{j}$ th neuron, the numerator is the exponent of the current signal, and the denominator is the sum of the exponents of all input signals.

$${Z}^{k}=h\left({A}^{k}\right)$$

(11)

$$\mathrm{h}\left({a}_{j}\right)=\frac{\mathrm{exp}\left({a}_{j}\right)}{\sum_{i=1}^{n}exp\left({a}_{i}\right)} j=\mathrm{1,2},3,\cdots ,n$$

(12)

In the error back propagation, the gradient descent method is used to calculate the extreme value of the cost function minimization, and the weights are updated. For example, formula (13) is the sample cost function. In order to prevent model overfitting, ${L}^{2}$ regularization term is added to the formula.

$$\mathrm{J}\left(w\right)=\frac{1}{2m}\left(\sum_{i=1}^{n}{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}+\lambda \sum_{j=1}^{n}{w}_{j}^{2}\right)$$

(13)

where $m$ is the number of samples, ${\widehat{y}}_{i}$ is the predicted value of the model, ${y}_{i}$ is the true value, and the mean square error between the two is calculated, $\uplambda $ is the hyperparameter. In this paper, adaptive learning rate (Adam) [15] is used to optimize the weight update efficiency, and the initial learning rate is set to 0.001. In order to reduce the computational cost, mini-batch learning is used, batch-size is set to 15, and the maximum number of iterations epoch is set to 80, so that the model quickly converges and completes the training in a short time. Figure 9 shows the relationship between the number of iterations and accuracy regarding the training set and test set. It can be seen that the curve is monotonically increasing on the whole. After the model is iterated 58 times, the accuracy of the test set and training set reaches the extreme value, and as the number of iterations increases, accuracy remains stable, indicating that the DNN training is completed at this time.

5 Experimental results

5.1 Sample Training and Evaluation

In order to verify validity of the lamp signal recognition algorithm in this paper, an experimental platform was built for real-vehicle tests. The camera frame rate was 211FPS, the resolution was $1920\times 1200$. The computer used in the experiment was configured with an i7-9700 CPU, a RTX 2080 SUPER graphics card, operating environment Win10, and the algorithm was compiled in Python. 7,024 images were selected from the KITTI dataset. As shown in the green box in Fig. 10, labelImg v1.4.0 software is used to label the vehicle tail in the image, and the labeled image is divided into training set and test set at a ratio of 9:1.

YOLOv4 was built on the OpenMMLab [18] platform. Figure 11 shows the accuracy of YOLOv4 in the test sample. As shown in Fig. 11, after 64 iterations, the accuracy can reach 95.3%, and the single frame detection speed can reach 63FPS, which meets the real-time detection requirements in real scenes.

The DNN data set collects videos in daytime urban traffic scenes. 326 pictures with a total of 418 vehicle driving scenes are selected as samples. As shown in Table 3, 80% samples are used as the training set and 20% as the test set, which are classified based on the tail lamp state of the vehicles in the collected pictures.

Table 3 Sample classification

Full size table

5.2 Visualization of test results

Figure 12 shows the results when the algorithm herein is used to detect the actual scene where the brake lamp, left turn signal lamp and right turn signal lamp are on, and both the brake lamp and turn signal lamp are on at the same time. The single target and multiple targets are detected respectively. Where, the red box represents the vehicle tail detection result, the red title represents the real lamp signal, and the green title represents the lamp signal detection result of the algorithm.

5.3 Verification of detection accuracy

In this paper, we collected videos of three different road sections, each with 20 s duration and 4,220 image frames. Based on the vehicle's tail lamp expression in the video, the vehicles were labeled with real lamp signal tags. As shown in Fig. 13, the Tkinter function package in Python was used to develop a visualization page to record the continuous frame detection results of this algorithm. The detection results are shown in Tables 4, 5, 6.

Table 4 Brake lamp (two-side red signal lamp) detection

Full size table

Table 5 Left turn signal lamp (left signal lamp on and right signal lamp off) detection

Full size table

Table 6 Right turn signal lamp (right signal lamp on and left signal lamp off) detection

Full size table

In order to quantitatively evaluate the performance of the lamp language recognition system, accuracy rate is defined as shown in formula (14). In the quantitative and qualitative evaluation, this paper stipulates that the detection result represents the state of the overall signal lamps at the current moment (the double-side signal lamps jointly determine this state).

$$\mathrm{Accuracy}=\frac{Correct\,detection\,number}{Number\,of\,lamps}$$

(14)

It can be seen from Tables 4, 5, 6 that this algorithm has an accuracy of more than 70% when it is used to recognize vehicle lamp signal in complex urban traffic scenes, and the average accuracy in recognizing brake lamp, left turn lamp and right turn lamp reaches 81.3%, 75.8%, 76.4% respectively. To further verify the superiority of the method proposed in this paper, we compare it with [1] and [2]. These experiments all use the same samples collected in this paper, and specify that the recognition accuracy of the turn signal includes both left and right sides. We train the corresponding Adaboost and SVM models for [1] and [2], respectively, for the determination of taillight regions and the classification of lamp signal. As shown in Table 7, for 4,220 frames of test images, the algorithm proposed in this paper achieves a turn signal recognition accuracy of 76.3%, which brings 3.2% and 10.5% gains compared to [1] and [2]. At the same time, the algorithm proposed in this paper achieves the highest 81.3% in the recognition of brake lights, which is 3.9% and 7.7% higher than [1] and [2].

Table 7 Algorithm comparison

Full size table

6 Conclusions

In traffic scenes, lamp signal is the language of vehicle communication, and the ability to accurately recognize the semantics of vehicle tail lamps is the key to the development of intelligent driving. For the detection of vehicle tail lamp pixels and the understanding of semantics, this paper proposes a lamp signal recognition algorithm that combines deep learning with computer vision.

(1)
The vehicle tail part in the KITTI data set is labeled, and YOLOv4 is trained to complete the vehicle tail detection. Then the ROI area is determined according to the tail lamp position in the vehicle to reduce the computational cost of the algorithm in pixel search of the tail lamp.
(2)
HSV space segmentation is performed on the tail lamps in the ROI area. By comparing the distribution characteristics of the tail lamps in the HSV space when it is off and on, a region-based adaptive threshold is proposed. By fixing the H and S channel scopes, we searched for the optimal V channel threshold to segment the pixels when the tail lamps of different colors are on.
(3)
The average value of the lamp lighting pixels in the H channel is collected after segmentation, and the DNN model is trained based on lamp signal expression form. Finally, the vehicle's semantic recognition of the brake lamp, turn signal lamp and lamp off state is completed through this model.
(4)
Experiments on actual traffic scenes in urban areas show that the algorithm has an average accuracy rate of 81.3%, 75.8%, and 76.4% respectively for brake lamp, left turn signal lamp, and right turn signal lamp. However, due to the vehicle location and camera imaging quality, missed and false detection may also occur. In the future work, we will consider building a larger database of vehicle taillights for inferring taillight occlusion using deep learning and further improve the lamp signal recognition accuracy.

Availability of data and material

Data can be obtained from the corresponding author upon request.

References

Kun LI, **angfeng LI (2019) Lamp language recognition technology based on daytime driving. Comput Sci 46(S2):277–282 ((in Chinese))
Google Scholar
Cui Z, Shaowen Y, Hsinmu T (2015) A vision-based hierarchical framework for autonomous front-vehicle taillights detection and signal recognition. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems, pp 931–937
Chen DY, Peng YJ, Chen LC (2014) Nighttime turn signal detection by scatter modeling and reflectance-based direction recognition. IEEE Sens J 14(7):2317–2326
Article Google Scholar
Fröhlich B, Enzweiler M, Franke U (2014) Will this car change the lane? -turn signal recognition in the frequency domain. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings, pp 37–42
He X, Zeng D (2017) Real-time pedestrian warning system on highway using deep learning methods. In: 2017 International Symposium on Intelligent Signal Processing and Communication Systems, pp 701–706
Yoneda K, Keisuke A, Suganuma N (2017) Convolutional neural network based vehicle turn signal recognition. In: 2017 International Conference on Intelligent Informatics and Biomedical Sciences, pp 204–205
Bochkovskiy A, Wang C Y, Liao H Y M (2020) Yolov4: Optimal speed and accuracy of object detection. ar**v preprint ar**v: 2004.10934
Mahto P, Garg P, Seth P (2020) Refining Yolov4 for vehicle detection. Int J Adv Res Eng Technol 11(5):409–419
Google Scholar
Geiger A, Lenz P, Urtasun R. (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361
Ministry of Industry and Information Technology of the People’s Republic of China (2019) Prescription for installation of the external lighting and light-signalling devices for motor vehicles and their trailers: GB4785–2019 (in Chinese)
Sural S, Qian G, Pramanik S (2002) Segmentation and histogram generation using the HSV color space for image retrieval. In: Proceedings. International Conference on Image Processing, pp 1522–4880
Rui-** QIAO, Yuan-chen DONG, Fang WANG (2020) Simulation of night vehicle detection based on mixed features of color channels. Comput Simulation 37(12):107–110 ((in Chinese))
Google Scholar
Dunhao L, Yanduo Z, Xun L (2016) An adaptive thresholding method under the dynamic environment. J Comput Appl 36(S2):152–156
Google Scholar
Hsien JC, Liou YS, Chen SY (2006) Road sign detection and recognition using hidden Markov mode. Asian J Health Inf Sci 1(1):85–100
Google Scholar
Hongwu F, Ke H, Liang S (2010) The research of tail light signals recognition based on vision independent vehicle. Comput Knowl Technol 6(34):9790–9792 ((in Chinese))
Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Chen K, Wang J, Pang J (2019) MMDetection: Open mmlab detection toolbox and benchmark. ar**v preprint ar**v: 1906.07155

Download references

Acknowledgements

The authors would like to thank the Natural Science Foundation of Anhui Province (2208085MF173), the financial supports of the key research and development projects of Anhui (202104a05020003) and Anhui development and reform commission supports R & D and innovation project ([2020]479)

Funding

This study was funded by Science Foundation of Anhui Province (2208085MF173), the Financial Supports of the Key Research and Development Projects of Anhui (202104a05020003) and Anhui Development and Reform Commission Supports R & D and Innovation Project ([2020]479).

Author information

Authors and Affiliations

School of Mechanical Engineering, Anhui Polytechnic University, Wuhu, 241000, China
Peicheng Shi, Heng Qi & Zhiqiang Liu
Department Polytechnic Institute of Zhejiang University, Hangzhou, 310000, Zhejiang, China
Aixi Yang

Authors

Peicheng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Heng Qi
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Aixi Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peicheng Shi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shi, P., Qi, H., Liu, Z. et al. Research on intelligent vehicle lamp signal recognition in traffic scene. SN Appl. Sci. 4, 327 (2022). https://doi.org/10.1007/s42452-022-05211-9

Download citation

Received: 03 August 2022
Accepted: 01 November 2022
Published: 12 November 2022
DOI: https://doi.org/10.1007/s42452-022-05211-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.