Keywords

1 Introduction

Human factors have been reported to be connected to 65% of all automobile accidents and 80% of all heavy truck accidents (Dhillon 2007). Eliminating traffic accidents caused by human error is one of the most promising benefits that Automated Driving Systems (ADSs) can provide (Fagnant and Kockelman 2015). However, current ADSs are still not autonomous systems that operate in all situations and without any need for human intervention. The Society of Automotive Engineers (SAE) defined 5 levels of automated driving (SAE 2016). According to the SAE definition, ADS at its lower levels (SAE levels 2–3) is able to perform dynamic driving tasks, but the human driver still has to take over control of the vehicle whenever a takeover request (TOR) is issued. In other words, at these levels, the driver’s role changes from an active operator to a fallback-ready driver.

For current vehicle drivers, the requirements of being a “fallback-ready driver” can be very confusing (Victor et al. 2018), and factors such as high workload (Radlmayr et al. 2019); Wu et al. (2020) are known to impair their correct use of driving automation. In addition, driver drowsiness level has been shown to increase faster in automated driving than in manual driving (Schömig et al. 2015). However, research on the effects of different drowsiness level on drivers’ takeover performance has produced conflicting results. Saxby et al. (2008) reported slower response time to an emergency in a drowsiness (or “passive fatigue”) condition evoked by a prolonged interval of automated driving (10 and 30 min). In a follow-up study, Saxby et al. (2013) replicated these results and observed a higher probability of crash. Similarly, Jarosch et al. (2019) found impaired reaction times and takeover qualities in a task-induced drowsiness condition. By contrast, a series of studies by Feldhütter and her colleagues produced a different pattern of results. Feldhütter et al. (2017) triggered a TOR after 5 min or 20 min of automated driving, and reported no difference in takeover performance between those two conditions. Feldhütter et al. (2018) triggered a TOR after drivers reached a certain level of drowsiness, and deterioration in takeover quality was observed, but slower reaction times were not found for the drowsy drivers. Feldhütter et al. (2019) adopted the same design in a more urgent condition but still found no evidence of slower reactions to TOR. Goncalves et al. (2016) triggered a TOR event when self-reported drowsiness reached a specific level, rather than after a constant duration, and reported poorer performance, but not reduced reaction time, in responding to TOR when drivers were drowsy compared to those who were not. Further, Schmidt et al. (2017) and Wu et al. (2019a) even reported no significant effects of driver’s drowsy state on either takeover time or takeover quality. As suggested by Feldhütter et al. (2018), these inconsistent findings may be attributable to differences in study design.

Saxby et al. (2008, 2013); Feldhütter et al. (2017); Wu et al. (2019a) and Jarosch et al. (2019) triggered a TOR after a predefined duration of automated driving. This duration-based design overlooks the possibility that different drivers can develop drowsiness very differently, thus producing the inconsistent results. All drivers may not become drowsy after a constant and predefined duration. For example, Wu et al. (2020) recognized age as a significant factor that affects drivers’ development of drowsiness during automated driving. The presence of both drowsy and alert drivers in a “drowsy” group defined by driving duration may have obscured significant differences in performance that produced unclear results.

To ensure that all drivers in the “drowsy” group are actually drowsy, TOR should be triggered when a certain level of drowsiness is self-reported or observed by a trained rater. However, such a design has also produced inconsistent results (Goncalves et al. 2016); Schmidt et al. (2017); and Feldhütter et al. (2018, 2019). In these studies, drowsiness was generally treated as a dichotomous state, with a threshold (usually the appearance of some drowsiness markers) predefined and a TOR triggered as soon as this threshold was reached. In this design, data from driver performance under extremely drowsy conditions were not available, because the TOR was already triggered (and the experiment completed) at the time the predefined threshold was reached. Considering that drowsiness is a transitional state between wakefulness and sleepiness (Johns 1998) and the interaction between drowsiness level and task performance seems to be non-linear (Kaida et al. 2006), the effect of drowsiness on takeover performance may also be nonlinear. In other words, it is possible that the driver could react appropriately when moderately drowsy but react inappropriately when highly drowsy. If so, the inconsistency in existing studies could be partially explained because their definitions and manipulations of the “drowsy” group were different.

The goal of this experimental study is to investigate the following two research questions regarding the effects of drowsiness on driver’s performance in responding to TOR:

  • Does an increase in drowsiness level after a long driving duration negatively affect drivers’ takeover performance?

  • If a moderate drowsiness level does not affect reaction times to TOR, will a high drowsiness level do this?

2 Methods

2.1 Participants and Apparatus

A total of 40 drivers (21–56 years old, mean = 39.1, SD = 11.9) were recruited from the local community. To avoid age and gender bias, five females and five males in their 20s, 30s, 40s, and 50s were recruited, respectively. In the recruiting advertisement, potential participants were required to have a legal driver license, drive frequently in his/her daily life, be in normal health, have no experience of motion sickness, and have normal vision. On average, the participants had 20.2 ± 11.6 years of driving experience and drove 14.8 ± 14.2 thousand kilometers per year. The National Institute of Advanced Industrial Science and Technology (AIST) Safety and Ethics committee approved the study, and each participant gave written informed consent before the experiment. Before the experiment, participants were required to complete a basic driver profile questionnaire, a driving style questionnaire, and a workload sensitivity questionnaire. All participants were paid for their participation.

All experiments were conducted in a fix-based middle-fidelity driving simulator (Mitsubishi Precision Co.). As shown in Fig. 1, the front and side views were projected on three liquid crystal displays. The driving simulator provided real-time and high-fidelity visual and auditory feedback to participants. A steering control loading system modulated reactive force against the input steering action (Moog Inc.). All data (e.g., speed of the ego vehicle) were sampled at a frequency of 60 Hz. In present study, the driving simulator was programmed to mimic the behavior of a SAE level 3 automated vehicle. In the automated mode, both longitudinal and lateral driving tasks were controlled by the system.

Fig. 1.
figure 1

The fix-based driving simulator and scene of the experiment.

2.2 Drowsiness Measures

Karolinska Sleepiness Scale (KSS).

Subjective drowsiness was measured by the Karolinska Sleepiness Scale (KSS; Åkerstedt and Gillberg 1990), which is a widely used 9-step self-report scale of drowsiness (1 = extremely alert, 3 = alert, 5 = neither alert nor sleepy, 7 = sleepy but no difficulty remaining awake, 9 = extremely sleepy and fighting sleep). Kaida et al. (2006) translated KSS into Japanese and verified its association with objective drowsiness measures. In the present experiment, a printed Japanese version of the scale was pasted on the left side of the vehicle dashboard. After each takeover event, drivers were asked to reflect and rate their drowsiness level at the exact moment before the TOR was issued. The auditory instruction for the driver to orally report subjective drowsiness level was programmed into the simulator scenario and prompted automatically.

Rated Drowsiness by Trained Experts.

Each driver’s drowsiness level was also measured by two trained experts who observed driver eye movements, facial expressions, and behavioral indicators online on two separate screens during automated driving. Each observer independently estimated the driver’s drowsiness level at 1 min intervals. The scale used to record observations was originally developed by Kitajima et al. (1997) and expanded by Homma (2016). As shown in Table 1, this scale has 6 levels (0 to 5), where 0 indicates that the driver is awake and motivated and 5 indicates that the driver is fully asleep. Following the definition of Homma (2016), scale levels 0 and 1 indicate low drowsiness, levels 2 and 3 indicate medium drowsiness, and levels 4 and scale 5 indicate high drowsiness. The rated level of drowsiness during the last 1 min epoch before the occurrence of TOR was used.

Table 1. Drowsiness and scale levels and respective behavioral markers used by trained experts. The descriptors have been translated from an original Japanese version developed by Kitajima et al. (1997) and adopted by Homma (2016).

2.3 Experiment Design

To evoke multiple levels of drowsiness, the duration of automated driving before a TOR was set to four levels: 2 min, 4 min, 8 min, and 16 min. To reduce the effect of surprise and to examine effects on an individual level, the experiment adopted a within-subjects repeated-measures design.

Procedures.

Each participant came to the laboratory during one of four 2 h sessions (08:30–10:30, 10:30–12:30, 13:30–15:30, or 15:30–17:30) on 3 different days to complete the same experimental procedure 3 times. The participant was asked to sleep for the same duration on the night before each participation day and to maintain the same caffeine intake and amount of exercise across the 3 participation days. Participants signed informed consent forms only on their first day of participation; otherwise, the experimental procedure was consistent across the 3 participation days.

After participants arrived at the laboratory, the experimenters presented an overview of the experimental procedure and the operation of the driving simulator. Before starting the experiment, all participants were required to leave their cell phones, watches, and any other personal items that may interfere with the experimental process in a locker. The experimenter then instructed the participant to put on electrooculography (EOG) electrodes and electrocardiography (ECG) electrodesFootnote 1. On each participation day, the participant first practiced driving the simulated automated vehicle for about 4 min to get familiar with the simulator and the ADS. During the practice drive, the participant experienced a takeover event that was exactly the same as that in the formal experiment.

As shown in Fig. 2 (top), the formal experiment on each of the 3 participation days consisted of 3 experimental drives, so each participant experienced 3 × 3 = 9 drives. A 2–3 min break was inserted between two consecutive experimental drives, during which the participant was allowed to drink water and use the restroom if needed. Within each of the 30 min experimental drives (Fig. 2, middle), the same 4 TOR events (Fig. 2, bottom) were presented after 2 min, 4 min, 8 min, or 16 min of automated driving. The occurrence order of the 4 durations was shuffled across different experimental drives, and the order was counterbalanced between different participants and different participation days, so that the participant would not be able to predict the occurrence of the TOR.

Fig. 2.
figure 2

Illustrative sketch of the experimental design. Top: Each participant experienced a total of 9 experimental drives on 3 participation days. Middle: Within each of the experimental drives, 4 TOR events were arranged after four drive durations: 2-min, 4-min, 8-min, and 16-min. The grey strips show the TOR events and the subsequent manual driving epochs. Note that the pattern of TORs shown in the figure (i.e., the ordering of 4-min, 2-min, 8-min, and 16-min) differed in other experimental drives. Bottom: traffic scenario of the TOR event.

Instructions to Participants.

Participants were briefed about the capability of the ADS. They were told that the ADS can perform both longitudinal and lateral control of the vehicle, but a TOR may be issued when the ADS reaches its limit. During automated driving mode, they were instructed to relax and to keep their hands off the steering wheel and their feet off the pedals. Drivers were also instructed not to completely fall asleep, since they would be responsible to manually drive the vehicle when a TOR was issued.

Scenario of TOR Event.

The scenario was built on a simulated Japanese dual two-lane motorway. To promote the development of driver drowsiness, the motorway was designed to be consistently straight, and the landscapes were very monotonous. The simulated ego-vehicle in the automated mode generally ran in the left lane with a cruising speed of 60 km/h. The takeover event was designed as a scenario involving 5 vehicles (Fig. 2, bottom). The vehicle running in front of the ego-vehicle changed to the right lane when a malfunctioning vehicle appeared in the left lane. At a speed of 60 km/h and a headway of 100 m, the time budget for takeover was 6 s. There were another two vehicles running behind the ego-vehicle, one in the left lane and the other in the right lane. After TOR was issued, the drivers had to confirm the position of the two vehicles and manually change to the right lane to avoid a collision with the malfunctioning vehicle. TOR was issued in two modalities, both auditory (a voice saying “Take over control” in Japanese) and visual (the color of the ADS symbol on the dashboard changing from green to orange).

2.4 Driving Performance Measures

Following the analysis in Wu et al. (2019b), this study also characterized driver’s performance in response to TOR by fast reaction, smooth maneuvers, and maintenance of an adequate safety margin. The following measures were defined.

Reaction Time to Take Over Request (RTtor).

RTtor was defined as the point in time when the driver began a conscious maneuver. A conscious maneuver was distinguished when a steering or braking input was greater than a threshold. Following Petermeijer et al. (2017) and Gold et al. (2013), two components were defined: RTsteer was the time consumed until the steering wheel was turned 2°, and RTbrake was the time consumed until the brake pedal was pressed at 10 percent of the full braking range. The smaller of the two values RTsteer and RTbrake was considered the reaction time to TOR (RTtor).

Minimum Speed (MinSpeed).

MinSpeed was defined as the minimum vehicle speed before changing to the right lane. In the simulator, the vehicle dynamics were exactly same across all the drives. A smaller MinSpeed indicates a harder braking input. This measure reflects takeover smoothness in the longitudinal direction.

Standard Deviation of Steering Wheel Position (SDsteer).

SDsteer was defined as the standard deviation of the steering wheel position within the first one-second epoch after changing to the right lane. A larger SDsteer value indicates more and/or larger steering operation in the running lane. This measure reflects takeover smoothness in the lateral direction.

Minimum TTC (MinTTC).

MinTTC is defined as the time to collision at the moment of the lane change; it is the quotient of headway divided by vehicle velocity. TTC at the moment of the lane change reflects the safety margin in response to TOR.

3 Results

The dataset includes the data of drowsiness level measures and the data of takeover performance measures. Drowsiness levels were measured by drivers’ subjective reports (KSS) as well as the rating scales of the two trained experts. Takeover performance was measured by reaction time measures (RTtor), smoothness measures (MinSpeed, SDsteer), and safety measures (MinTTC). Among the 40 participants, 39 participants completed all 9 experimental drives, and one participant completed 8 experimental drives. Each experimental drive had four TOR events after four different durations of automated driving (2 min, 4 min, 8 min, or 16 min), resulting in a total of (39 × 9) + (1 × 8) = 359 sets of data under each of the 4 driving duration conditions. The total number of datasets was 1436.

Shapiro-Wilk tests Razali and Wah (2011) were conducted to test the normality of the data. If the sample of one measure did not differ significantly from a normal distribution, a parametric analysis of variance (ANOVA) was conducted for this measure (F-distribution statistics). Otherwise, a non-parametric Kruskal-Wallis H Test was conducted (chi-squared distribution statistics). Post-hoc multiple comparisons were conducted using the Tukey-Kramer method. The significance level was set at 0.05.

3.1 Driving-Duration Based Analysis

KSS scores were considered as a subjective measure of drowsiness and the experts’ rated scores were considered as an objective measure. As shown in Fig. 3, average KSS scores increased with an increase in driving duration. Figure 4 shows the results for drowsiness level evaluated by the two experts. The colored bars show percentages of expert estimations of drowsiness level (low, medium, high) under each condition. For example, in the 2-min condition, expert 1 (Fig. 4a) rated 69.8% of driver’s drowsiness levels as low, 20.1% as medium, and 10.1% as high. As driving duration increased, both experts rated more data as high in drowsiness and less data as low in drowsiness.

Fig. 3.
figure 3

Average KSS scores under four driving-duration conditions. Error bars indicate standard deviations.

Fig. 4.
figure 4

Percentages of drowsiness levels under the four driving duration conditions rated by (a) expert 1 and (b) expert 2. The colored bars show the percentages of expert estimations of drowsiness level as low (0–1), medium (2–3), and high (4–5) under each condition. The solid line shows the mean of the drowsiness scale (Table 1, 0–5 scale), and error bars indicate standard deviations.

To statistically examine the effects of driving duration on both drowsiness measures and driving performance measures, parametric or non-parametric analysis of variance were conducted. Because the data of both KSS scores and expert-rated drowsiness levels did not pass the normality test, non-parametric tests were conducted. A significant main effect (χ2 = 13.6, p = 0.004) of driving duration on KSS scores indicated that drivers reported a higher level of drowsiness after a longer duration of automated driving (Fig. 3). Significant main effects were also found for drowsiness levels rated by both expert 1 (Fig. 4(a), χ2 = 17.7, p = 0.005) and expert 2 (Fig. 4(b), χ2 = 25.6, p < 0.001).

For driving performance measures, the data of MinTTC passed the normality test, while the other data did not. Parametric ANOVA was conducted for MinTTC and non-parametric Kruskal-Wallis H tests were conducted for the other measures. Although significant effects of driving duration on driver drowsiness levels were confirmed, effects of driving duration could not be found for RTtor (Fig. 5, χ2 = 2.52, p = 0.47), MinSpeed (Fig. 6, χ2 = 1.28, p = 0.73), SDsteer (Fig. 6, χ2 = 3.35, p = 0.34), or MinTTC (Fig. 7, F = 0.1, p = 0.96). As shown in Figs. 5, 6 and 7, obvious changes in driving performance measures with an increase in driving duration could not be confirmed. No significant differences were found for any paired comparison, as indicated by the overlap** notches between different conditions.

Fig. 5.
figure 5

Measure of reaction time to TOR under the four driving-duration conditions. This boxplot was produced using the MATLAB boxplot function. On each box, the central line indicates the median, and the bottom and top edges of the box indicate the 25th (Q1) and 75th (Q3) percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers, whose upper limit is Q3 + 1.5 × (Q3 − Q1) and lower limit is Q1 − 1.5 × (Q3 − Q1). The notch indicates the 95% confidence interval of the median. Two medians are significantly different at the 5% level if their notch intervals do not overlap.

Fig. 6.
figure 6

Measures of maneuvering smoothness after TOR under the four driving-duration conditions. This boxplot was produced using the MATLAB boxplot function. Explanation of boxplots can be found in the caption of Fig. 5.

Fig. 7.
figure 7

Measure of safety in responding to TOR under the four driving-duration conditions. This boxplot was produced using the MATLAB boxplot function. Explanation of boxplots can be found in the caption of Fig. 5.

3.2 Drowsiness-Level Based Analysis

To further investigate the effects of drowsiness level on takeover performance, datasets for specific drowsiness levels were selected to conduct a drowsiness-level based analysis. A set of data was included if the two trained experts rated it at the same level (low, medium, or high) of drowsiness, and this set of data was then labelled with the rated level. Consequently, 287 datasets were labelled as low drowsiness, 262 datasets as medium drowsiness, and 184 datasets as high drowsiness, representing a total of 51% of the 1436 datasets. Among all the available datasets, 3 crash events occurred. Both of the trained experts rated the 3 crash events with a drowsiness level of 5; in other words, the participants fell asleep before the TOR and crashed afterwards. The crash events were excluded from an analysis of variance conducted to check whether driver performance differed significantly under the three drowsiness conditions.

The results for driver reaction time are shown in Fig. 8. Average RTtor under low, medium, and high drowsiness conditions was 1.97 ± 0.40 s, 1.93 ± 0.38 s, and 2.18 ± 0.72 s, respectively. Because the RTtor data did not pass the normality test, a non-parametric Kruskal-Wallis H test was conducted. A significant main effect of drowsiness level on RTsteer (χ2 = 26.9, p < 0.001) was found. Post-hoc multiple comparisons revealed that drivers reacted significantly more slowly under high drowsiness condition than under medium or low drowsiness conditions, and there was no significant difference between low and medium drowsiness conditions.

Fig. 8.
figure 8

Measure of reaction time to TOR under low, medium, and high drowsiness conditions. This boxplot was produced using the MATLAB boxplot function. Explanation of boxplots can be found in the caption of Fig. 5.

The results for driver maneuvering smoothness are shown in Fig. 9. Average MinSpeed under low, medium, and high drowsiness conditions was 55.1 ± 2.2 km/h, 54.6 ± 3.1 km/h, and 53.2 ± 5.5 km/h, respectively. Because the MinSpeed data did not pass the normality test, a non-parametric Kruskal-Wallis H test was conducted. A significant main effect of drowsiness level on MinSpeed (χ2 = 38.8, p < 0.001) was found. Post-hoc multiple comparisons revealed that drivers drove significantly more slowly under high drowsiness conditions than under medium or low drowsiness conditions, and a significant difference between low and medium drowsiness conditions was also found. Average SDsteer under low, medium, and high drowsiness conditions was 1.88 ± 1.36°, 1.90 ± 1.35°, and 2.28 ± 2.63°, respectively. Because SDsteer data did not pass the normality test, a non-parametric Kruskal-Wallis H test was conducted. The main effect of drowsiness level on SDsteer was marginally significant (χ2 = 4.47, p = 0.10).

Fig. 9.
figure 9

Measures of maneuvering smoothness after TOR under low, medium, and high drowsiness conditions. This boxplot was produced using the MATLAB boxplot function. Explanation of boxplots can be found in the caption of Fig. 5.

The results for minimum Time-To-Collision during TOR events are shown in Fig. 10. Average MinTTC under low, medium, and high drowsiness conditions was 1.80 ± 0.44 s, 1.88 ± 0.45 s, and 1.71 ± 0.52 s, respectively. Because MinTTC data did not pass the normality test, a non-parametric Kruskal-Wallis H test was conducted. A significant main effect of drowsiness level on MinTTC (χ2 = 11.8, p = 0.002) was found. Post-hoc multiple comparisons revealed only a significant difference between high and medium drowsiness conditions.

Fig. 10.
figure 10

Measure of safety in responding to TOR under low, medium, and high drowsiness conditions. This boxplot was produced using the MATLAB boxplot function. Explanation of boxplots can be found in the caption of Fig. 5.

4 Discussion

We address first the two research questions, followed by general discussion and limitations.

4.1 After a Longer Driving Duration, a Higher Level of Drowsiness Was Confirmed, but Takeover Performance Did not Deteriorate

As shown in Fig. 3 and Fig. 4, both self-report KSS scores and percentages of expert estimations of drowsiness level increased with an increase in driving duration, indicating that a simple manipulation of driving duration was able to induce statistically different levels of drowsiness. We assumed that a longer driving duration would not always lead to a high drowsiness level for all participants. This assumption was supported by the levels of drowsiness rated by the two trained experts. The percentages of driver states rated as highly drowsy after 16 min of automated driving were 22% for expert 1 and 28% for expert 2. The opposite pattern was also true: Expert 1 and expert 2 rated 10% and 12% of driver states as highly drowsy after only 2 min of automated driving. The presence of both highly and slightly drowsy drivers in the same driving-duration condition was therefore confirmed. Thus, averaging takeover performance data from different driving-duration conditions neutralizes the difference between highly drowsy drivers and slightly drowsy drivers, leading to the erroneous conclusion that drowsiness does not affect drivers’ takeover performance.

4.2 Effects of Drowsiness on Takeover Performance are Multi-staged

By selecting and analyzing datasets assigned to specific drowsiness levels, we were also able to examine the effects of different drowsiness levels on takeover performance. Under medium drowsiness conditions, driver takeover quality deteriorated compared to low drowsiness conditions (Fig. 9), while driver reaction times to TOR did not significantly change (Fig. 8). This result is in line with the findings of Gonçalves et al. (2016) and Feldhütter et al. (2018, 2019). If we look only at average values, drivers seemed to react faster and were able to maintain a larger TTC when mediumly drowsy than when alert. We concur with Feldhütter et al. (2018, 2019) that drivers tend to sacrifice takeover quality (i.e., by harsh braking or unstable steering) to achieve a fast reaction and maintain an adequate safety margin.

In this study, a high drowsiness level was defined as the state of being fully asleep or in micro-sleep (frequent occurrence of eye closure for 2–3 s). Under high drowsiness conditions, evidence of significantly slower reactions as well as worse takeover qualities was found. Further, three crash events occurred when both of the trained experts rated the driver as fully asleep (scale level 5). These findings support the idea that the effects of drowsiness on takeover performance may progress in multiple stages: when mediumly drowsy, drivers were still able to sacrifice takeover quality to achieve faster reactions and to ensure safety; when drowsiness evolved to a high level, drivers reacted more slowly and more roughly to TORs; and when fully asleep, drivers’ crash probability also increased.

This multi-staged effect may also partially explain the non-significance of reaction time after predefined durations of automated driving (Figs. 5, 6 and 7). Mediumly drowsy drivers tended to have similar reaction times, compared to slightly drowsy drivers. When TORs were triggered after 2 min or 16 min of automated driving, high drowsiness data and medium drowsiness data may have coexisted, and when reaction times were averaged, significant differences between these driving duration conditions may have disappeared.

4.3 Practical Applications and Limitations

The current findings provide the following evidence for researchers investigating the effects of drowsiness level on takeover performance:

  • Following Feldhütter et al. (2018), we endorse an experimental design in which TOR is triggered by drowsy state rather than driving duration.

  • Drowsiness should be treated as a multi-level state instead of a dichotomous (drowsy or not drowsy) state.

  • Extremely drowsy conditions should also be considered.

The current findings also have the following implications for designers who intend to support fallback-ready drivers in automated driving:

  • Driver assistance systems should support mediumly drowsy drivers in improving takeover quality.

  • Safety countermeasures are necessary for TORs that happen when the driver is highly drowsy.

Finally, it is important to acknowledge the limitations of this study. First, the longest driving duration adopted in this experiment was 16 min. In the study of Gonçalves et al. (2016), a majority of the subjects reported a high level of drowsiness before 15 min of automated driving. It is possible that more drivers would become highly drowsy after a longer duration. Although this was not evaluated in the current study, individual differences can be significant in the development of drowsiness, and drowsiness can fluctuate over time Karrer et al. (2004). For example, while a duration of 60 min may be the minimum duration to elicit drowsiness for one driver, another driver may already have taken a nap and become alert again within 60 min. Second, the datasets used in the drowsiness-level based analysis were those with clear drowsiness labels and represented 51% of all the datasets. Since those two trained experts estimated driver drowsiness online, we did not require them to make repeated estimates of drowsiness levels until they provided consistent ratings. The authors hope to overcome these limitations in future research.