Investigating sources of inaccuracy in wearable optical heart rate sensors

Bent, Brinnae; Goldstein, Benjamin A.; Kibbe, Warren A.; Dunn, Jessilyn P.

doi:10.1038/s41746-020-0226-6

Investigating sources of inaccuracy in wearable optical heart rate sensors

Article
Open access
Published: 10 February 2020

Volume 3, article number 18, (2020)
Cite this article

Download PDF

You have full access to this open access article

npj Digital Medicine

Investigating sources of inaccuracy in wearable optical heart rate sensors

Download PDF

58k Accesses
283 Citations
442 Altmetric
27 Mentions
Explore all metrics

Matters Arising to this article was published on 26 February 2021

Abstract

As wearable technologies are being increasingly used for clinical research and healthcare, it is critical to understand their accuracy and determine how measurement errors may affect research conclusions and impact healthcare decision-making. Accuracy of wearable technologies has been a hotly debated topic in both the research and popular science literature. Currently, wearable technology companies are responsible for assessing and reporting the accuracy of their products, but little information about the evaluation method is made publicly available. Heart rate measurements from wearables are derived from photoplethysmography (PPG), an optical method for measuring changes in blood volume under the skin. Potential inaccuracies in PPG stem from three major areas, includes (1) diverse skin types, (2) motion artifacts, and (3) signal crossover. To date, no study has systematically explored the accuracy of wearables across the full range of skin tones. Here, we explored heart rate and PPG data from consumer- and research-grade wearables under multiple circumstances to test whether and to what extent these inaccuracies exist. We saw no statistically significant difference in accuracy across skin tones, but we saw significant differences between devices, and between activity types, notably, that absolute error during activity was, on average, 30% higher than during rest. Our conclusions indicate that different wearables are all reasonably accurate at resting and prolonged elevated heart rate, but that differences exist between devices in responding to changes in activity. This has implications for researchers, clinicians, and consumers in drawing study conclusions, combining study results, and making health-related decisions using these devices.

Accuracy of Heart Rate Measurement with Wrist-Worn Wearable Devices in Various Skin Tones: a Systematic Review

Article 14 November 2022

Guidelines for wrist-worn consumer wearable assessment of heart rate in biobehavioral research

Article Open access 26 June 2020

A Clinician’s Guide to Smartwatch “Interrogation”

Article 30 May 2022

Introduction

Wearable technology has the potential to transform healthcare and healthcare research by enabling accessible, continuous, and longitudinal health monitoring. With the number of chronically ill patients and health system utilization in the US at an all-time high,^1,2 the development of low-cost, convenient, and accurate health technologies is increasingly sought after to promote health as well as improve research and healthcare capabilities. It is expected that 121 million Americans will use wearable devices by 2021.³ The ubiquity of wearable technology provides an opportunity to revolutionize health care, particularly in communities with traditionally limited healthcare access.

The growing interest in using wearable technologies for clinical and research applications has accelerated the development of research-grade wearables to meet the needs of biomedical researchers for clinical research and digital biomarker development.⁴ Consumer-grade wearables, in contrast to research-grade wearables, are designed, developed, and marketed to consumers for personal use. While research- and consumer-grade wearables often contain the same sensors and are quite similar functionally, their markets and use cases are different, which may influence accuracy (Supplementary Table 1). Digital biomarkers are digitally collected data that are transformed into indicators of health outcomes. Digital biomarkers are expected to enable actionable health insights in real time and outside of the clinic. Both consumer- and research-grade wearables are frequently being used in research, with the most common brands being Fitbit (PubMed: 476 studies, ClinicalTrials.gov: 449 studies) for consumer-grade wearables and Empatica (PubMed: 22 studies, ClinicalTrials.gov: 22 studies) for research-grade wearables (Supplementary Table 2).

It is, therefore, of critical importance to evaluate the accuracy of the wearable technologies that are being used in clinical research, digital biomarker development, and personal health. The lack of clarity surrounding the verification and validation procedures and the unknown reliability of the data generated by these wearable technologies poses significant challenges for their adoption in research and healthcare applications.^4,5,6

Recently, the accuracy of wearable optical heart rate (HR) measurements using photoplethysmography (PPG) has been questioned extensively.^{7,8,9,10,11,12,13} Wearables manufacturers sometimes report some expected sources of error, but the reporting and evaluation methods are inconsistent^{14,15,16,17,18,19,20,21,22} (Table 1). Of particular interest, previous research demonstrated that inaccurate PPG HR measurements occur up to 15% more frequently in dark skin as compared to light skin, likely because darker skin contains more melanin which absorbs more green light than lighter skin.^{23,24,25,26,27,28,29,30,31} Interestingly, some manufacturers of wearable devices recommend using their device only in light skin tones and/or at rest.^17,32

Table 1 Reported accuracy, outliers, evaluation process, and factors that affect performance by each device manufacturer.

Full size table

Another suspected measurement error in wrist-worn devices is motion artifact, which is typically caused by displacement of the PPG sensor over the skin, changes in skin deformation, blood flow dynamics, and ambient temperature.^33,34 Motion artifacts may manifest as missing or false beats which result in incorrect HR calculations.^35,36,37 Several studies have demonstrated that HR measurements from wearable devices are often less accurate during physical activity or cyclic wrist motions.^{8,11,35,38,39} Several research groups and manufacturers have identified that cyclical motion can affect accuracy of HR in wearable sensors.^9,10,15 The cyclical motion challenge has been described as a “signal crossover” effect wherein the optical HR sensors on wearables tend to lock on to the periodic signal stemming from the repetitive motion (e.g., walking and jogging) and mistake that motion as the cardiovascular cycle.⁴⁰

To date, no studies have systematically validated wearables under various movement conditions across the complete range of skin tones, and particularly on skin tones at the darkest end of the spectrum. Here, we present a comprehensive analysis of wearables HR measurement accuracy during various activities in a group of 53 individuals equally representing all skin tones. To our knowledge, this is the first reported characterization of wearable sensors across the complete range of skin tones. Validation of wearable devices during activity and across all skin tones is critical to enabling their equitable use in clinical and research applications.

Results

Study summary

A group of 53 individuals successfully completed the entire study protocol (32 females, 21 males; ages 18–54; equal distribution across the Fitzpatrick (FP) skin tone scale). This protocol was designed to assess error and reliability in a total of six wearable devices (four consumer-grade and two research-grade models) over the course of approximately 1 h (Fig. 1). Each round of the study protocol, included (1) seated rest to measure baseline (4 min), (2) paced deep breathing⁴¹ (1 min), (3) physical activity (walking to increase HR up to 50% of the recommended maximum;⁴² 5 min), (4) seated rest (washout from physical activity) (~2 min), and (5) a ty** task (1 min). This protocol was performed three times per study participant in order to test all devices. In each round, the participant wore multiple devices according to the following: Round 1: Empatica E4 + Apple Watch 4; Round 2: Fitbit Charge 2; Round 3: Garmin Vivosmart 3, ** for six wearable devices representing both consumer wearables and research-grade wearables. HR metrics are compared to the clinical-grade electrocardiogram (ECG) as the standard for heart rate measurement.

Full size image

Potential relationships between error in HR measurements and (1) skin tone, (2) activity condition, (3) wearable device, and (4) wearable device category were examined using mixed effects statistical models. We developed comprehensive, individual, and interaction mixed effects models for the independent variables using mean HR measurement error as the dependent variable (Table 2). We found that wearable device, wearable device category, and activity condition all significantly correlated with HR measurement error, but changes in skin tone did not impact measurement error or wearable device accuracy.

Table 2 Results of mixed effects comprehensive and marginal models.

Full size table

Wearables accuracy across skin tones

Anecdotal evidence and incidental study findings supported the hypothesis that PPG measurements may be less accurate on darker skin tones than on lighter skin tones.^{8,9,10,11,12,13} To systematically explore this hypothesis, we examined the mean directional error (MDE) and the mean absolute error (MAE) of HR measurements within each FP skin tone group at rest and during physical activity.

Among skin tone groups at rest, FP5 had the largest MDE across all devices and FP1 had the lowest MDE (−4.25 bpm and −0.53 bpm, respectively) (Supplementary Figs 1a, 2a, Supplementary Table 7a). In absolute error terms, the darkest skin tone (FP6) had the highest MAE and the second darkest skin tone (FP5) had the lowest MAE at rest (10.6 bpm and 8.6 bpm, respectively) (Fig. 2c, e, Supplementary Table 6a). The average MDE and MAE across all skin tone groups at rest were −2.99 bpm and 9.5 bpm, respectively. Among skin tone groups during activity, FP5 had the highest MDE and FP3 had the lowest MDE (9.21 bpm and 7.21 bpm, respectively; Fig. 2b, Supplementary Table 7b). FP4 had the highest MAE and FP3 had the lowest MAE (14.8 bpm and 10.1 bpm, respectively; Fig. 2d, f, Supplementary Table 6b). Skin tone appears to not be the driver of MAE or MDE.

**Fig. 2: Error in heart rate across skin tones and devices at rest and during activity.**

In the comprehensive and marginal mixed effects models, we found no significant correlation between skin tone and HR measurement error (Table 2). While we found no overall effect of skin tone, we tested whether the effect of skin tone differed based on individual devices. We did find a significant interaction between skin tone and device (Table 2). Upon further examination, this was shown to be based on the Biovotion device, which showed a decrease in resting HR and increase active HR (Fig. 2). During activity, the highest MDE occurs in FP5 and/or FP6 in all devices except for the ** (Supplementary Fig. 3) and found that MAE was higher during ty** compared with rest in all devices, and often nearly as high as during walking, except for the Apple Watch and the Empatica E4 (Supplementary Fig. 3a). The MDE was higher during ty** as compared with rest in the Miband, Empatica, and Biovotion. Interestingly, while both ty** and walking had poor performance overall, walking tended to cause reported HR to be higher than true HR, whereas ty** caused the reported HR to be lower than the true HR (Supplementary Fig. 3b). Surprisingly, the MAE and MDE were lower during deep breathing than at rest in all devices except for the Apple Watch, in which the deep breathing condition was the condition with the worst performance (Supplementary Fig. 3). During deep breathing, reported HR was generally lower than true HR (Supplementary Fig. 3).

Signal alignment

Lags between the ECG- and PPG-derived HR signals ranging between 0 and 43 s were discovered during our preliminary exploratory data analysis. These lags were inconsistent; in some cases, the lag was fixed and in other cases the lag was dynamic (Supplementary Fig. 5). The source of these lags could not be pinned down with certainty and may possibly be attributed to (1) misaligned time stamps (highly unlikely due to our time synchronization protocol described in the methods as well as the sometimes dynamic time lags observed), (2) data processing artifacts (uneven or delayed sampling, compute, and/or data reporting), (3) missed heart beats due to low frequency measurements by the wearable, or (4) a delay between the actual heart beat and the change in blood volume at wrist.

In order to remove lag as a factor that could contribute to error calculated in the previous sections, we performed signal alignment using two different approaches (cross-correlation and smoothing with a rolling window) and recalculated MAE and MDE on the newly aligned signals (Supplementary Fig. 6). Using the updated MAE and MDE at each window size from the smoothing, we reanalyzed the relationships in the previous sections and found no differences in conclusions from the previous sections. Our model did show that window length is related to HR measurement error (Supplementary Table 9). We performed a sensitivity analysis to determine how smoothing could affect improvements in accuracy, and we found that in most cases, smoothing reduced HR measurement error as demonstrated by the fact that the median optimal window size >0 (Supplementary Fig. 7). MAE and MDE were in general improved the most by smaller window sizes (less smoothing) during activity and wider window sizes (more smoothing) at rest, likely because changes in activity intensity would not be captured by wider smoothing windows. (Supplementary Fig. 7b). This did not hold true for the Apple Watch 4 and Empatica E4 for MDE or the Biovotion Everion for MAE.

Potential relationship between wearable device cost, market size, release year, and error

Wearables vary widely in terms of release year, data accessibility, and cost (Supplementary Table 1). We used devices across a wide range of costs, market sizes, and release times at the time of this study (Apple Watch 4, Fitbit Charge 2, Garmin Vivosmart 3, and ** review of the literature. PLoS ONE 13 (2018)." href="/article/10.1038/s41746-020-0226-6#ref-CR47" id="ref-link-section-d62210771e1281">47 Here, we explored one important aspect regarding the accuracy of wearables across the full range of skin tones. We found no statistically significant differences in wearable HR measurement accuracy across skin tones, however, we did find other sources of measurement inaccuracies, including activity type and type of device. Researchers, clinicians, and health consumers must recognize that the information derived from different wearables should not be weighted equally for drawing study conclusions, combining study results, and making health-related decisions. Algorithms that are used to calculate digital biomarkers should consider error and measurement quality under the various circumstances that we have shown in this study. Digital biomarker interpretation must take this data quality into account when making healthcare decisions.

Methods

Study population

Totally, 56 participants (34 females, 22 males, 18–54 years of age, mean = 25.6, racial breakdown: 8 African American, 21 Asian, 8 Hispanic, and 19 Caucasian-White) were recruited for this study. Data from three participants was excluded from the study due to incomplete ECG records. The subjects all consented to the study and were compensated for their participation. The study was approved by the Institutional Review Board at Duke University and informed consent was obtained from all participants. We enrolled an approximately equal distribution of skin tones (F1:7, F2:8, F3:10, F4: 9. F5: 9, F6:10) on the FP skin tone scale, the standard skin tone scale with six categories of pigmentation^48,49 (one to six, one being the lightest and six being the darkest). Participants were excluded if they had skin conditions or sensitivities that would be exacerbated by wearing a wearable device/sensor and/or electrode pads or if they were taking medications/substances that affect HR (including, but not limited to Adderall, performance enhancing drugs, human growth hormones, and illegal substances). The demographics from this study are shown in Supplementary Table 10.

Sample size

Based on our power analysis, we required ≥48 participants to achieve 80% power to reject the null hypothesis that there is no difference in PPG accuracy between skin tone groups (∝ = 0.5). Effect size for the power analysis was based on a pilot study examining differences in light absorption across skin tones²⁵ and was determined to be 0.3. Difference in green light mean absorption for different skin tones during activity was used to calculate effect size since optical HR measurements primarily measure green light absorption.^24,25 Based on the ANOVA power calculation, we required eight participants per skin tone category (6 skin tone categories on FP scale). We also performed a multiple regression power calculation (f2 = 0.15, power = 0.8, ∝ = 0.5) and determined the number of participants required was a total of 46 for the mixed effects model.

Devices and data collection protocol

We tested four consumer wearable devices used frequently in research studies, as shown in Fig. 1, including the Apple Watch 4 (Apple Inc., Cupertino, CA), Fitbit Charge 2 (Fitbit, Inc., San Francisco, CA), Garmin Vivosmart 3 (Garmin Ltd., Olathe, Kansas), ** task. During baseline, participants were asked to remain seated in a comfortable position for 4 min. This was followed immediately by a deep breathing exercise, where participants breathed in sync with a 1-min deep breathing video.⁵¹ Participants then participated in a walking activity for 5 min. Participant HR was monitored during this time to ensure that the participant reached 50% of their maximum HR and did not exceed their maximum HR (220-age). A washout period of approximately 2 min occurred before participants began the ty** task to ensure HR had returned to baseline. Participants typed on a mechanical computer keyboard (Dell Model: SK-8115) for 1 min before switching devices to begin the next phase.

In the first phase, the Empatica E4 was placed on the right wrist and the Apple Watch 4 on the left wrist. In the second phase, the Fitbit Charge 2 was placed on the left wrist. During the third phase, the Garmin Vivosmart 3 was placed on the right wrist and the **aomi Miband 3 was placed on the left wrist. Participants wore the Biovotion on the upper right arm for all three phases but data from only the last phase (Phase 3) was used in this study.

Time syncing and signal alignment

All wearable devices were connected to Wi-Fi-only enabled mobile device (smart phone or laptop). In order to prevent desynchronization via internal clock time drift, at the start of each study, each wearable device was connected to a mobile device to synchronize the clock time following ISO 8601.^52,53,54,55 Prior to the start of each study, each mobile device was connected to the network to synchronize their internal clock time via the Network Time Protocol (NTPv4).⁵⁴ Once connected to the Wi-Fi and synchronized, the NTP client updates the mobile device clock approximately every 10 min.⁵⁶

The Apple Watch, Fitbit, Garmin, and Biovotion were connected to the iPhone SE (iOS), the **aomi Miband was connected to the Android Samsung Galaxy 4 mobile device, and the Bittium Faros and Empatica E4 were connected to ThinkPad Laptop running Windows 10. iPhones running iOS5 and above automatically syncs to the NTP, and the settings in both Android and Windows 10 were set to ensure automatic NTP syncing upon Wi-Fi connection. Because the wearable devices used in this study were not precision instruments, processing lag times between devices that occur during the NTP sync may affect the device clock time by milliseconds.⁵⁷

Mixed effects modeling

To assess the impact of various factors and account for repeated measurements on participants we used a mixed-model approach. We first fit a null model shown

$$Y_{ij} = \alpha _i + s + c + d + \varepsilon _{ij},$$

(1)

where the observations (Y) is the Difference between ECG HR and Wearable HR for each participant (i) at each timepoint (j). ε_ij accounts for the random noise. The random effect parameter α_i accounts for participant-specific differences.

Next, we fit univariable models accounting for skin tone (s), condition (c) (rest, walking, deep breathing, and ty**), and device (d), respectively

$$Y_{ij} = \alpha _i + s + {\it{\epsilon }},$$

(2)

$$Y_{ij} = \alpha _i + c + {\it{\epsilon }},$$

(3)

$$Y_{ij} = \alpha _i + d + {\it{\epsilon }}.$$

(4)

We also examined an interaction model to examine whether there is an interaction between skin tone and device factors as shown in Eq. (5)

$$Y_{ij} = \alpha _i + s \ast d + {\it{\epsilon }}.$$

(5)

We assessed the added value of the factor via a likelihood ratio test, comparing the larger model to the null model. A significant p value indicates that the larger model provides a better fit. This is akin to repeated measures ANOVA. We used a p value < 0.0125, based on a Bonferroni correction to indicate significance (taking three factors—skin tone, device, and activity condition into account).

Differences algorithm

Raw ECG was processed using the clinical standard, Kubios HRV Premium (version 3.3) to extract RR intervals and HR. Differences between the ECG and each wearable sensor were calculated at each matched timestamp for each wearable sensor for each participant. Both relative and absolute differences were calculated as shown in Eqs. (6) and (7).

$${\rm Directional}\,{\rm difference\!:}\,{\rm HR}_{\rm ECG} - {\rm HR}_{\rm Wearable}.$$

(6)

$${\rm Absolute}\,{\rm difference\!:}\,{\rm |HR}_{\rm ECG} - {\rm HR}_{\rm Wearable}{|}.$$

(7)

Calculations of error

We have defined error as the difference between HR from the ECG and the wearable sensor. Thus, higher error indicates a larger difference between the wearable sensor and the “true” value from the ECG. Error was compared across skin tones using an unpaired, two-sided t test with Welch approximation and Bonferroni multiple hypothesis correction of 0.00028 (considering 6 choose 2 skin tone comparisons—15 skin tone comparisons × 6 devices × 2 conditions—rest and activity).

$${\rm Mean}\,{\rm directional}\,{\rm error}_{\rm participant}:\,\frac{{\sum} {\rm HR}_{\rm ECG} - {\rm HR}_{\rm Wearable}}{{\rm Number}_{{\rm matched}\,{\rm timestamps}}}.$$

(8)

$${\rm Mean}\,{\rm absolute}\,{\rm error}_{\rm participant}:\frac{{\sum} {\left| {\rm HR}_{\rm ECG} - {\rm HR}_{\rm Wearable} \right|}}{{\rm Number}_{{\rm matched}\,{\rm timestamps}}}.$$

(9)

Calculations of Missingness

Missingness is calculated from the expected sampling rate (study average sampling rate). The calculation used to determine Missingness (%) is shown in Eq. (10). Statistical differences between missingness for activity and baseline were calculated using paired, two-sided t tests with a Bonferroni multiple hypothesis correction (taking into account four devices, p value = 0.0125). Statistical differences between missingness for skin tones were calculated using unpaired, two-sided t tests between a skin tone and all other skin tones for each device with a Bonferroni multiple hypothesis corrected p value of 0.001 (taking into account four devices, six skin tones, and two conditions).

$${\rm{Missingness}\,\left( {\%} \right):100 - \left( {\frac{{\rm{Actual}\# {\rm{Samples}}}}{{Expected\# Samples}}} \right) \ast 100}.$$

(10)

Calculations and analysis of HRV

Because HRV requires access to raw, sample-level data that is not currently provided by most wearables, out of the six devices tested, we were limited to using only the Empatica E4 for the HRV accuracy analysis. HRV time-domain metrics from the Empatica device have been validated against ECG in previous studies.^58,59,60 Frequency domain metrics of HRV have not been sufficiently validated on wearable optical HR sensors, thus are excluded from this analysis.

HRV was only calculated during baseline due to motion artifacts affecting the signal. Raw ECG was processed using the clinical standard, Kubios HRV Premium (version 3.3) to extract RR intervals. PPG data from the Empatica E4 device is supplied as both raw PPG (green LED light only) and an inter beat interval (IBI) sequence. The IBI sequence provided by Empatica is obtained from their wristband-integrated processing algorithm that removes incorrect peaks due to noise in the raw PPG signal, which they compute from the red and green LEDs on the device. Red LED PPG signal is not saved or provided and is only used in the calculation of the provided IBI sequence.

We matched raw PPG and IBI sequences and removed data that the Empatica wristband- integrated processing algorithm removed onboard. Our updated PPG signal could then be used to extract IBI sequences for HRV calculations. A Kolmogorev–Zurbenko low pass linear filter (Kolmogorev) and outlier removal was used to mitigate any additional motion artifact not removed by the Empatica processing algorithm. Following the process described by Empatica for determining their IBI sequence, local minima were detected using a rolling minimum detector and the IBI values were calculated as the difference between these local minima values. Outlier cap** at 1.5*IQR was performed for each downsampled signal.

All calculations for time-domain HRV were performed with user-defined functions in Python (3.5.2) that were validated using Kubios HRV Premium (version 3.3). Error was calculated between the ECG HRV and the PPG HRV for each participant. Paired, two-sided t tests were performed with a significance threshold of Bonferroni-corrected p value of 0.0033 (considering 6 choose 2 skin tone comparisons—15 skin tone comparisons for each HRV metric).

Lag time analysis using a rolling window approach

In order to examine the effect of lag time on our model, we iterated through rolling windows of 5, 10, 20, 30, 40, 50, 60, 90, 120, 150, 180, 210, 240 s for each participant, each device, and each condition (rest or activity). We found the optimal window length of MAE and MDE by determining the window length that minimized the MAE and MDE, respectively. We then repeated the mixed effects model, adding window length as an effect.

Activity level disparity statistical testing

Mean relative error from ECG for both baseline and activity were recorded for each participant. Mean relative errors across participants were used for a paired, two-sided t test with Welch approximation and Bonferroni multiple comparison correction with an initial significance threshold of p < 0.05 and a Bonferroni-corrected p < 0.0042 (considering six devices and two conditions—rest and activity).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data sets generated during and/or analyzed during the current study will be submitted one year from the publication date to the PhysioNet public repository under the title BigIdeasLab_STEP.

Code availability

Code for the algorithm development, evaluation, and statistical analysis is open source with no restrictions and is available from https://github.com/Big-Ideas-Lab/STEP.

References

Chronic Diseases in America | CDC. https://www.cdc.gov/chronicdisease/resources/infographic/chronic-diseases.htm (2019).
Health Care Cost and Utilization Report 2017 (2019).
Older Americans Drive Growth of Wearables. eMarketer https://www.emarketer.com/content/older-americans-drive-growth-of-wearables (2018).
Munos, B. et al. Mobile health: the power of wearables, sensors, and apps to transform clinical trials. Ann. N. Y. Acad. Sci. 1375, 3–18 (2016).
Article PubMed Google Scholar
Witt, D. R., Kellogg, R. A., Snyder, M. P. & Dunn, J. Windows into human health through wearables data analytics. Curr. Opin. Biomed. Eng. 9, 28–46 (2019).
Article PubMed PubMed Central Google Scholar
Goldsack J, et al. Verification, Analytical Validation, and Clinical Validation (V3): The Foundation of Determining Fit-for-Purpose for Biometric Monitoring Technologies (BioMeTs). Preprint at https://preprints.jmir.org/preprint/17264 (2020).
Ruth Hailu. Fitbits, other wearables may not accurately track heart rates in people of color. STAT News. https://www.statnews.com/2019/07/24/fitbit-accuracy-dark-skin/ (2019).
Jo, E., Lewis, K., Directo, D., Kim, M. J. & Dolezal, B. A. Validation of biofeedback wearables for photoplethysmographic heart rate tracking. J. Sports Sci. Med. 15, 540–547 (2016).
Reddy, R. K. et al. Accuracy of wrist-worn activity monitors during common daily physical activities and types of structured exercise: evaluation study. JMIR mHealth uHealth 6, e10338 (2018).
Article PubMed PubMed Central Google Scholar
Sartor, F. et al. Wrist-worn optical and chest strap heart rate comparison in a heterogeneous sample of healthy individuals and in coronary artery disease patients. BMC Sports Sci. Med. Rehabil. 10, 10 (2018).
Article PubMed PubMed Central Google Scholar
Shcherbina, A. et al. Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. J. Pers. Med. 7, 3 (2017).
Article PubMed Central Google Scholar
Wallen, M. P., Gomersall, S. R., Keating, S. E., Wisløff, U. & Coombes, J. S. Accuracy of heart rate watches: implications for weight management. PLoS ONE 11, e0154420 (2016).
Article PubMed PubMed Central Google Scholar
Bickler, P. E., Feiner, J. R. & Severinghaus, J. W. Effects of skin pigmentation on pulse oximeter accuracy at low saturation. Anesthesiology 102, 715–9 (2005).
Article PubMed Google Scholar
Apple Support. Your heart rate. What it means, and where on Apple Watch you’ll find it—Apple Support. https://support.apple.com/en-us/HT204666 (2019).
Apple Support. Get the most accurate measurements using your Apple Watch—Apple Support. https://support.apple.com/en-us/HT207941#heartrate (2019).
Fitbit. Purepulse Technology. https://www.fitbit.com/technology (2019).
Garmin. What Can Cause the Heart Rate Sensor from My Watch to Be Inaccurate? https://support.garmin.com/en-US/?faq=rxsywTpox9AHVzOXFxni59 (2019).
Garmin. Garmin | Accuracy Disclaimer. https://www.garmin.com/en-US/legal/atdisclaimer/ (2019).
Empatica Support. E4 data—Empatica Support. https://support.empatica.com/hc/en-us/articles/360029469772-E4-data-HR-csv-explanation (2019).
Biovotion. Everion data quality—Biovotion AG. https://biovotion.zendesk.com/hc/en-us/articles/212369329-Everion-data-quality- (2019).
Biovotion. The accuracy of Heart Rate—Biovotion AG. https://biovotion.zendesk.com/hc/en-us/articles/213794685-The-accuracy-of-Heart-Rate (2019).
Biovotion. What impacts and affects my vital sign readings?—Biovotion AG. https://biovotion.zendesk.com/hc/en-us/articles/360025102213-What-impacts-and-affects-my-vital-sign-readings- (2019).
Zonios, G., Bykowski, J. & Kollias, N. Skin melanin, hemoglobin, and light scattering properties can be quantitatively assessed in vivo using diffuse reflectance spectroscopy. J. Invest. Dermatol. 117, 1452–1457 (2001).
Article CAS PubMed Google Scholar
Tseng, S.-H., Grant, A. & Durkin, A. J. In vivo determination of skin near-infrared optical properties using diffuse optical spectroscopy. J. Biomed. Opt. 13, 014016 (2008).
Article PubMed Google Scholar
Fallow, B. A., Tarumi, T. & Tanaka, H. Influence of skin type and wavelength on light wave reflectance. J. Clin. Monit. Comput. 27, 313–317 (2013).
Article PubMed Google Scholar
Treesirichod, A., Chansakulporn, S. & Wattanapan, P. Correlation between skin color evaluation by skin color scale chart and narrowband reflectance spectrophotometer. Indian J. Dermatol. 59, 339–342 (2014).
Article PubMed PubMed Central Google Scholar
Ries, A. L., Prewitt, L. M. & Johnson, J. J. Skin color and ear oximetry. Chest 96, 287–290 (1989).
Article CAS PubMed Google Scholar
Lister, T., Wright, P. A. & Chappell, P. H. Optical properties of human skin. J. Biomed. Opt. 17, 0909011 (2012).
Article Google Scholar
Weiler, D. T., Villajuan, S. O., Edkins, L., Cleary, S. & Saleem, J. J. Wearable heart rate monitor technology accuracy in research: a comparative study between PPG and ECG technology. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 61, 1292–1296 (2017).
Article Google Scholar
Nitzan, M., Romem, A. & Koppel, R. Pulse oximetry: fundamentals and technology update. Med. Devices Evid. Res. 7, 231 (2014).
Article Google Scholar
Yan, L., Hu, S., Alzahrani, A., Alharbi, S. & Blanos, P. A Multi-wavelength opto-electronic patch sensor to effectively detect physiological changes against human skin types. Biosensors 7, 22 (2017).
Everion. Biovotion Instruction for Consumer (2018).
Castaneda, D., Esparza, A., Ghamari, M., Soltanpur, C. & Nazeran, H. A review on wearable photoplethysmography sensors and their potential future applications in health care. Int. J. Biosens. Bioelectron. 4, 195–202 (2018).
PubMed PubMed Central Google Scholar
Lemay, M. et al. Chapter 2.3. Application of Optical Heart Rate Monitoring. https://doi.org/10.1016/B978-0-12-418662-0.00023-4 (2014).
Zhang, Y. et al. Motion artifact reduction for wrist-worn photoplethysmograph sensors based on different wavelengths. Sensors 19, 673 (2019).
Article PubMed Central Google Scholar
Tautan, A.-M., Young, A., Wentink, E. & Wieringa, F. Characterization and reduction of motion artifacts in photoplethysmographic signals from a wrist-worn device. in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 6146–6149. https://doi.org/10.1109/EMBC.2015.7319795 (IEEE, 2015).
Tamura, T. et al. Wearable photoplethysmographic sensors—past and present. Electronics 3, 282–302 (2014).
Article Google Scholar
Zong, C. & Jafari, R. Robust heart rate estimation using wrist-based PPG signals in the presence of intense physical activities. in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 8078–8082. https://doi.org/10.1109/EMBC.2015.7320268 (IEEE, 2015).
Reis, V. M. et al. Are wearable heart rate measurements accurate to estimate aerobic energy cost during low-intensity resistance exercise? PLoS ONE 14, e0221284 (2019).
Article CAS PubMed PubMed Central Google Scholar
Valencell Team. Valencell | Optical Heart Rate Monitoring: What You Need to Know. https://valencell.com/blog/2015/10/optical-heart-rate-monitoring-what-you-need-to-know/ (2015).
Six Dijkstra, M. et al. Exploring a 1-minute paced deep-breathing measurement of heart rate variability as part of a workers’ health assessment. Appl. Psychophysiol. Biofeedback 44, 83–96 (2019).
Article PubMed Google Scholar
Mayo Clinic. Mayo Clinic Staff. Exercise intensity: how to measure it—Mayo Clinic. https://www.mayoclinic.org/healthy-lifestyle/fitness/in-depth/exercise-intensity/art-20046887 (2019).
Shaffer, F. & Ginsberg, J. P. An overview of heart rate variability metrics and norms. Front. Public Health 5, 258 (2017).
Article PubMed PubMed Central Google Scholar
Nelson, B. W. & Allen, N. B. Accuracy of consumer wearable heart rate measurement during an ecologically valid 24-hour period: intraindividual validation study. JMIR mHealth uHealth 7, e10828 (2019).
Article PubMed PubMed Central Google Scholar
Wang, R. et al. Accuracy of wrist-worn heart rate monitors. JAMA Cardiol. 2, 104 (2017).
Article PubMed Google Scholar
Prawiro, E. A. P. J., Yeh, C.-I., Chou, N.-K., Lee, M.-W. & Lin, Y.-H. Integrated wearable system for monitoring heart rate and step during physical activity. Mob. Inf. Syst. 2016, 1–10 (2016).
Google Scholar
Weiss, D. et al. Innovative technologies and social inequalities in health: a sco** review of the literature. PLoS ONE 13 (2018).
Silonie Sachdeva. Fitzpatrick Skin Ty**: Applications in Dermatology. Indian J. Dermatol Venereol Leprol. 75, 93–96 (2009).
Fitzpatrick, T. B. The validity and practicality of sun-reactive skin types I through VI. Arch. Dermatol. 124, 869–871 (1988).
Article CAS PubMed Google Scholar
CTA Standard. Physical Activity Monitoring for Heart Rate CTA-2065 (2018).
YouTube. Erin Klassen. Triangle breathing, 1 min—YouTube. https://www.youtube.com/watch?v=u9Q8D6n-3qw (2015).
Adams, C. & Pinkas, D. Internet X.509 Public Key Infrastructure Time-Stamp Protocol (TSP). https://dl.acm.org/doi/book/10.17487/RFC3161.
ISO—ISO 8601 Date and time format. https://www.iso.org/iso-8601-date-and-time-format.html (2019).
Kevin Dooley. The Why and How of Syncing Clocks on Network Devices. https://www.auvik.com/franklymsp/blog/syncing-clocks-network-devices/ (2017).
John Martellaro. What Time is it? Your iPad & iOS 5 Finally Knows—The Mac Observer. https://www.macobserver.com/tmo/article/what_time_is_it_your_ipad_ios_5_finally_knows (2011).
TechTarget. What is Network Time Protocol (NTP)?—Definition from WhatIs.com. https://searchnetworking.techtarget.com/definition/Network-Time-Protocol (2019).
Todd Johnson. Devices and Timestamps: Seriously Though, WTF? | Airship. https://www.airship.com/blog/devices-and-timestamps-seriously-though-wtf/ (2016).
Corino, V. D. A., Matteucci, M. & Mainardi, L. T. Analysis of heart rate variability to predict patient age in a healthy population. Methods Inf. Med. 46, 191–5 (2007).
Article CAS PubMed Google Scholar
McCarthy, C., Pradhan, N., Redpath, C. & Adler, A. Validation of the Empatica E4 wristband. in 2016 IEEE EMBS International Student Conference (ISC) 1–4. https://doi.org/10.1109/EMBSISC.2016.7508621 (IEEE, 2016).
Hugh et al. Consumer-Grade Wrist-worn Ppg Sensors Can Be Used to Detect Differences in Heart Rate Variability among A Heterogenous Prediabetic Population (IEEE International Conference on Biomedical and Health Informatics, 2018).

Download references

Acknowledgements

The authors would like to thank Dr. Hwanhee Hong for input on the statistical design and Weihsien (Willy) Lee for assistance with study participant recruitment. BB is funded by the Duke FORGE Fellowship.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Duke University, Durham, NC, USA
Brinnae Bent & Jessilyn P. Dunn
Department of Bioinformatics and Biostatistics, Duke University, Durham, NC, USA
Benjamin A. Goldstein, Warren A. Kibbe & Jessilyn P. Dunn

Authors

Brinnae Bent
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin A. Goldstein
View author publications
You can also search for this author in PubMed Google Scholar
Warren A. Kibbe
View author publications
You can also search for this author in PubMed Google Scholar
Jessilyn P. Dunn
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.B. was involved in the study design, data collection, data analysis and interpretation, and paper preparation. J.D. was involved in the study design and funding, data analysis and interpretation, and paper preparation. B.A.G. and W.A.K. were involved in the data interpretation and paper preparation.

Corresponding author

Correspondence to Jessilyn P. Dunn.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Material

NR Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bent, B., Goldstein, B.A., Kibbe, W.A. et al. Investigating sources of inaccuracy in wearable optical heart rate sensors. npj Digit. Med. 3, 18 (2020). https://doi.org/10.1038/s41746-020-0226-6

Download citation

Received: 01 October 2019
Accepted: 17 January 2020
Published: 10 February 2020
DOI: https://doi.org/10.1038/s41746-020-0226-6
Springer Nature Limited

This article is cited by

Walk, talk, think, see and feel: harnessing the power of digital biomarkers in healthcare
- Dylan Powell
npj Digital Medicine (2024)
Capturing a mentalized moment: A pilot study of the psychometric properties of a novel assessment method of mentalizing in daily life
- Noa Steinberg
- Rotem Moshe-Cohen
- Yogev Kivity
Current Psychology (2024)
Cardiorespiratory Sensors and Their Implications for Out-of-Hospital Cardiac Arrest Detection: A Systematic Review
- Saud Lingawi
- Jacob Hutton
- Calvin Kuo
Annals of Biomedical Engineering (2024)
Evaluation of Wrist-Worn Photoplethysmography Trackers with an Electrocardiogram in Patients with Ischemic Heart Disease: A Validation Study
- Nur Syazwani Ibrahim
- Sanjay Rampal
- Anwar Suhaimi
Cardiovascular Engineering and Technology (2024)
Implications of Bias in Artificial Intelligence: Considerations for Cardiovascular Imaging
- Marly van Assen
- Ashley Beecy
- Judy Gichoya
Current Atherosclerosis Reports (2024)

Investigating sources of inaccuracy in wearable optical heart rate sensors

Abstract

Similar content being viewed by others

Introduction

Results

Study summary

Wearables accuracy across skin tones

Signal alignment

Potential relationship between wearable device cost, market size, release year, and error

Methods

Study population

Sample size

Devices and data collection protocol

Time syncing and signal alignment

Mixed effects modeling

Differences algorithm

Calculations of error

Calculations of Missingness

Calculations and analysis of HRV

Lag time analysis using a rolling window approach

Activity level disparity statistical testing

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation