Introduction

Over the last 50 years, the number and complexity of epidemiologic studies has grown and demands for participants has risen [1]. However, willingness to volunteer for scientific activities has declined, which is reflected by decreasing response rates [1,2,3]. Therefore, an initial refusal to participate in a study may not be interpreted as a general refusal of taking part in the study itself. Rather, constraints on participants’ time and availability might make study demands appear too high. Therefore, making clinical examinations more efficient and attractive, using multimedia options, and making such offers closer to the participants’ place of residence in a digital form or mobile platform might improve participation rates.

Digital solutions are already in use for survey-based research and are also increasingly applied to patient-reported outcome measures (PROMs). Beyond self-reported measures, wearables and smartphone applications are promising candidates that may also facilitate mobile measurement of medical variables [4, 5]. Another approach is taken by the Preventiometer (Fig. 1) [5, 6]. It is an interactive multi-device platform designed to assess prevention-related medical variables such as blood pressure, body fat, and pulse oximetry. During examinations, the participant takes place in a padded seat and looks at the inner side of a dome where videos are projected to (see Fig. 1). These videos contain instructions and background information on the examinations. The procedure can be controlled by the participant by pressing two buttons integrated into the armrest of the seat. The entire examination is accompanied by a study nurse who operates the control computer of the Preventiometer and monitors the measurement processes. The Preventiometer can be implemented in a mobile platform (e.g. a bus or van) to enable examinations closer to the participants place of residence. While the virtual assistant may contribute to a higher degree of standardization, the uncommon examination environment might also induce excitement, thereby impacting clinical measurements.

Fig. 1
figure 1

The mobile Preventiometer installed in a bus (Preventiometer 1)

Acceptance of the Preventiometer by participants was previously assessed in a wellness context at Mayo clinics [7, 8]. Participants agreed or strongly agreed that it was both comfortable and engaging. In our current project P rävention für A rbeitnehmer zur Reduktion von K rankheits t agen durch M otivation und V erhaltensänderung ([preventive healthcare for workers with the aim to reduce absenteeism by motivation and behavior] PAKt-MV) [9] we evaluated the accuracy of the central measurement device as related results were not available from other studies.

In Study 1, we estimated the reliability by measuring participants twice within a Preventiometer and assessed the agreement between the repeated measurements. In Study 2, we estimated the measurement agreement of the Preventiometer with results of similar variables as obtained in the examination center of a population based cohort study, the Study of Health in Pomerania (SHIP) [10,11,12]. In both studies, only those examinations and variables of the Preventiometer that had a comparable examination in SHIP were included.

Study 1: agreement of repeated measurements within Preventiometers (Reliability)

The goal of Study 1 was to estimate the reliability of Preventiometer. Two Preventiometers at different locations in different environments were used in this study. One was on a mobile platform placed in a bus and the other one was stationary in a room of the local hospital. A stationary Preventiometer was used because only one bus was available. Participants were tested twice (repeated measure) with one of the Preventiometers. For efficiency and comparability between Study 1 and Study 2 we selected only measures that were available for the Preventiometer and SHIP for Study 1.

Materials and methods

Study sample

A convenience sample of 22 males and 53 females with a mean age of 41.7 years (SD = 13.3) in the range from 18 to 71 years participated. All participants were recruited among employees of the University Medicine of Greifswald and their families or acquaintances. All participants gave written informed consent. The Ethics Committee of the University Medicine Greifswald approved the study protocol.

Preventiometer

Two Preventiometers were used for the reliability assessment. The first Preventiometer was installed in an articulated bus (Mercedes-Benz Citaro G, Evobus) at the premises of the University Medicine Greifswald as part of the mobile preventive healthcare project PAKt-MV. It will be referred to as the mobile Preventiometer. The second Preventiometer was installed in an office within the Department of General Practice. It will be referred to as the stationary Preventiometer. Five examinations of the Preventiometer were comparable to examinations of SHIP (see Study 2): Somatometry, blood pressure measurement, body fat measurement, pulse oximetry and spirometry (see Table 1 for a detailed overview). Because somatometric examinations were conducted outside the Preventiometer device, they were only assessed once and are therefore not subject to reliability analysis.

Table 1 Comparable examinations and the corresponding measurement instruments of the Preventiometer and SHIP

Examinations within the Preventiometer were conducted by study nurses who were first trained in the SHIP examination center for basic examinations (somatometry, blood pressure measurement, and spirometry) and then trained by instructors from the manufacturer of the Preventiometer.

Design

Study 1 followed a repeated measurement design, i.e. each participant was examined twice in a Preventiometer in immediate succession. The examinations within Preventiometers were always conducted in the following order: Somatometry (only at the first measurement occasion), blood pressure and body fat measurement, pulse oximetry and spirometry. A subset of the participants (n = 22 with a mean age of 32.7 [SD = 8.65], consisting of 7 males and 15 females) were examined twice in each Preventiometer in immediate succession, thus contributing data for the analysis of both Preventiometers (in contrast to participants that were tested twice in one of the Preventiometers). The clinical measurements in the Preventiometer are described in detail below.

Somatometry

Height was measured using a stadiometer. Participants were asked to remove their shoes for this measurement. For the waist and hip circumferences a simple measuring tape was used. For the weighting participants stripped down to their underwear.

Blood pressure and body fat measurement

Systolic and diastolic blood pressure and body fat percentage were both measured with the OEM version of the HealthGuard-15 Portable Health Kiosk. It consists of an oscillometric blood pressure measurement device and a near-infrared interactance body fat measurement device [13]. The cuff for the blood pressure measurement was applied to the left and the body fat sensor to the triceps of the right arm of the participant. Both measurements were taken simultaneously. This measurement was taken after non-exhausting activities (i.e., somatometry), but no specified resting phase was implemented. This procedure followed the suggestions by the manufacturer.

Pulse oximetry

For pulse oximetry, a Nonin 3231 USB Pulse oximeter was used that was attached to the right index finger of the participant.

Spirometry

Spirometric parameters were measured with the Carefusion SpiroUSB spirometer. At least three expiratory maneuvers were conducted from which the best trial was selected to determine the spirometric parameters of interest. The procedure followed a detailed SOP that was in line with German guidelines [14] as far as the expiratory part of spirometry is concerned.

Statistical analysis

We evaluated the reliability of measurements by means of intra-class correlation coefficients (ICC) as a two-way random effects model with absolute agreement and single measurement [15]. We considered ICCs ≥ 0.70 as indicative of acceptable reliability [16]. Additionally, we report the variance components (VC) for persons, replications and residuals estimated by the ICC function from the R package psych to allow for a differentiation of systematic and random measurement error and the standard error of measurement for agreement (SEMagreement) as proposed by Vet et al. [17]. Furthermore, we computed the mean of differences (i.e. bias) between repeated measurements within participants, the standardized mean difference (SMD), and the limits of agreement (LoA) for the repeated measurements. The SMD was computed as the mean of the differences (i.e. bias) between repeated measurements within participants divided by the standard deviation of these differences, and the limits of agreement were computed as the mean of the differences (i.e. bias) ± 1.96 times the standard deviation of the differences between the first and second measurements.

Finally, we plotted the differences against the averages according to Bland and Altman [18] to allow for a visual inspection of (dis-)agreement between the measurements. All analyses were conducted separately for the mobile and the stationary Preventiometer.

All data were complete. All calculations were performed with the statistical software R [19] and additional R packages [20,21,22,23,24,25].

Results

All examinations have ICCs above 0.70 (see Table 2). ICCs for diastolic blood pressure (mobile), body fat, heart rate (mobile) and spirometric variables surpass 0.90. There are no substantial mean differences between the first and second measurement in the Bland–Altman-plots (see Fig. 2 and Fig. 3). However, observed extreme differences between observations primarily concerned the mobile Preventiometer. This is also in line with the tendency of the variance component of the replications to be higher for the mobile Preventiometer in the case of blood pressure and heart rate measurements.

Table 2 Agreement between repeated measurements for mobile and stationary Preventiometers
Fig. 2
figure 2

Bland–Altman Plots for repeated measurements within Preventiometer 1 (mobile)

Fig. 3
figure 3

Bland–Altman Plots for repeated measurements within Preventiometer 2 (stationary)

Discussion

In both Preventiometers, retest-reliability estimates were excellent for body fat, vital capacity, and peak flow whereas agreement for the systolic blood pressure, diastolic blood pressure and heart rate was lower but still in the acceptable range [16].

To put our result in context, we compared them with results from other reliability studies (Table 3). Overall, reliability in terms of ICCs are mostly in line with comparable method comparison studies and can be regarded as sufficient, yet some discrepancies are noteworthy. For example, in the context of the HERITAGE family study [26], ICCs for blood pressure were somewhat smaller than in our study. This may be explained by the larger time interval between measurements in the HERITAGE study (one day vs. approximately one hour). The ICCs from a study evaluating the reliability of a predecessor of the body fat measurement device built into the Preventiometer [27] were slightly smaller than in our study. ICCs for heart rate measurements in our study lie in the middle of the range of ICCs that have been reported in two studies comparing different devices for the measurement of heart rate [4, 28]. Whereas the ICCs for Peak flow (PEF) are in line with observed ICCs from other studies [29, 30], ICCs for FVC in our study are larger. This may be due to the shorter time interval between both measurements. Overall, the mean differences between the first and second measurements were small. Foremost in the mobile Preventiometer, heart rate seems to decrease slightly between the first and second measurement. This may reflect an adaptation to the new and mildly exciting examination context in the mobile Preventiometer.

Table 3 Reliability estimates from similar method comparison studies

Study 2: agreement between Preventiometer and SHIP measurements (validity)

The aim of Study 2 was to estimate the measurement agreement of Preventiometer examinations with comparable examinations in a population-based cohort study, the Study of Health in Pomerania (SHIP). This provides insights into the usability of Preventiometer measurements instead of SHIP measurements, for example when potential participants can better be accessed by allowing for a mobile assessment close to their homes. SHIP comprises two cohorts, and a large range of health related variables have been assessed. More details have been described elsewhere [10,11,12]. SHIP is subject to rigorous internal and external quality control Therefore, data from SHIP was used as reference for the Preventiometer.

Materials and methods

Study sample

In total, 155 (53% female) participants of the SHIP-Trend-1 cohort [11] with a mean age of 57 years (SD = 13) were enrolled. Recruitment for additional Preventiometer assessments took place at the SHIP examination center after participants completed their SHIP examinations on the same day.

All participants gave written informed consent. The Ethics Committee of the University Medicine Greifswald approved the study protocol.

Design

The design of Study 2 followed a method comparison study design with a single measurement on each method [32]. Participants were first examined in the SHIP study center and afterwards in one of the two Preventiometers. The time interval between the two measurements was about 1 to 6 h. Examinations in SHIP were conducted by certified SHIP examiners whereas examinations in the Preventiometers were performed by examiners of the project PAKt-MV who were trained both in the SHIP study center and on the Preventiometer.

Examinations

Examinations of the Preventiometer have been described in the methods section of Study 1. Detailed descriptions of SHIP examinations can be found elsewhere (e.g., blood pressure, height, weight, and waist circumference [33]; spirometry [34, 35]). A comparison of the instruments is displayed in Table 1. In the following section, we focus on methodological differences between Preventiometer and SHIP that might be of relevance for the evaluation of their agreement.

Somatometry

Whereas body height is measured with a mechanical stadiometer in the Preventiometer, it is measured via an ultrasound method in SHIP. Weight and waist circumference variables are measured using similar measurement techniques (see Table 1). Participants were asked to take off their shoes for height measurement and strip to their underwear for weight measurement.

Blood pressure measurement

Blood pressure is measured in the Preventiometer and SHIP by automatic oscillometric devices. However, in the Preventiometer, blood pressure is measured once without an explicit resting phase before the measurement, while blood pressure is measured three times in SHIP and the final value is computed as the mean of the second and third measurement. Before the first measurement, there is a five-minute resting phase in SHIP and between the three measurements, there are three minutes pauses. Finally, in the Preventiometer, blood pressure is measured on the left arm whereas in the SHIP, blood pressure is measured on the right arm.

Body fat measurement

Body fat percentage is measured by a near infrared interactance device in the Preventiometer where a sensor is placed on the triceps of the participant. On the basis of this measurement, the fat percentage of the whole body is extrapolated. In contrast, in SHIP body fat percentage is measured using a Bod Pod, which uses air displacement plethysmography [36,37,38,39].

Pulse oximetry

Heart rate is measured by a pulse oximeter in the Preventiometer. In SHIP, heart rate is determined during the course of blood pressure measurement by the blood pressure device.

Spirometry

The spirometry device in the Preventiometer only recorded expiratory maneuvers but did not allow measurements of inspiratory maneuvers while in SHIP, an inspiratory and an expiratory maneuver was conducted.

Statistical analysis

We evaluated the agreement between measurements analogous to Study 1. Again, all analyses were conducted separately for the mobile and the stationary Preventiometer.

We excluded five data pairs from the analyses. In two cases, body weight was measured fully clothed in the Preventiometer which violated the study protocol. In another two cases, extreme differences for body height measurement (128.2 cm in the Preventiometer vs. 168 cm in the SHIP and 159.5 cm in the Preventiometer vs 170 cm in the SHIP, respectively) were most likely due to data input errors in the Preventiometer. Finally, an extremely large difference for body weight measurement was detected (81.9 kg in the Preventiometer vs. 112.9 kg in the SHIP). This was also attributed to a data input error in the Preventiometer. Additionally, there were a few missing comparisons per examination (see Table 4) which were due to occasional malfunctions of the Preventiometer and missing values in the SHIP. All calculations were performed with the statistical software R [19] and additional packages [20,21,22,23,24,25].

Table 4 Agreement between Preventiometer (mobile and stationary) and SHIP measurements

Results

All ICCs were larger than 0.70, except for systolic blood pressure in the stationary Preventiometer and diastolic blood pressure in both Preventiometers.

Positive bias (i.e., Preventiometer measurements larger than SHIP measurements on average) were found for body height, body weight, systolic and diastolic blood pressure and heart rate (mobile Preventiometer). Negative bias (i.e., Preventiometer measurements smaller than their SHIP counterparts on average) were found for waist and hip circumference, vital capacity and peak flow and heart rate (stationary Preventiometer).

Comparing the Bland–Altman Plots for hip and waist circumference for the mobile Preventiometer (Fig. 4), the size of the LoA for hip circumference measurements is mainly driven by some extremely large differences, even after the outlier elimination, whereas the range of the LoA for waist circumference measurements is based on a more consistent distribution of the differences. There is also evidence for proportional bias (i.e. a statistically significant slope in the regression of the differences on the averages) in the Bland–Altman plots of body height, diastolic blood pressure, body fat and vital capacity for the mobile Preventiometer. Regarding the stationary Preventiometer (Fig. 5), some extreme differences between measurements occurred that are located by a far margin outside the limits of agreement. In the cases of hip and waist circumference measurements, differences around 20 cm occurred. For systolic blood pressure measurement, there are two differences around or even above 50 mmHg. This is also reflected in a much higher variance component of methods for the stationary Preventiometer for these measurements. Furthermore, there is evidence for proportional bias (see above) for body height, heart rate, body fat and vital capacity.

Fig. 4
figure 4

Bland–Altman plots for the comparison between Preventiometer 1 (mobile) and SHIP measurements

Fig. 5
figure 5

Bland–Altman plots for the comparison between Preventiometer 2 (stationary) and SHIP measurements

Discussion

In Study 2, we assessed measurement agreement from a mobile and a stationary Preventiometer with measurements obtained during SHIP examinations. While SHIP measurements can be conceived as a proxy to validity, there are two concerns that limit this interpretation: (1) Some of the measures change over the course of the day, such as blood pressure. There were up to several hours between both measurements because participants were first fully examined in SHIP and afterwards in one of the Preventiometers. (2) Measurement protocols were not exactly the same.

Results from both Preventiometers were largely consistent. At least acceptable ICCs (> 0.70) were found for all variables except for blood pressure measurements, where ICCs between 0.5 and 0.6 occurred. In both Preventiometers, blood pressure measurements were higher compared to their SHIP counterparts whereas the opposite was true for spirometric measurements.

Table 5 displays an overview of results from method comparison studies with similar variables. Four studies reported ICCs and/or bias and limits of agreement for somatometric variables. The observed mean differences in our study for body height, body weight, hip, and waist measurements are not larger in comparison but the limits of agreement for hip and waist measurements are. The latter indicates the presence of more unsystematic measurement error in the Preventiometer assessment.

Table 5 Agreement and validity estimates from similar method comparison studies

Method comparison studies related to blood pressure measurement reported a wide range of agreement indices depending on the compared methods, the context of measurement, and the duration between measurements. Bias and limits of agreement we observed in our study lie at the upper end compared to these studies. The strict criterion proposed by the European Society of Hypertension according to which 95% limits of agreement should not exceed 15 mmHg was not met [57]. The observed differences may be explained by the procedural differences as outlined above, particularly the lack of a systematic resting period prior to the measurements due to the interest of shortening the examination time, and the time-interval between Preventiometer and SHIP measurements.

ICCs for body fat seemed relatively low when compared to other measures. A study comparing near-infrared interactance (NIA)—the same method as implemented in the Preventiometer—and dual-energy X-ray absorptiometry (DXA) body fat measurement reported absolute bias and limits of agreement that fall into the same range as the present study [44]. However, the same study reported smaller absolute bias values and narrower limits of agreement when comparing bioelectrical impedance analysis (BIA) to DXA. In another study comparing BIA and calipometry to hydrodensitometry, even smaller bias values and narrower limits of agreement are reported [45]. The ICCs reported in a validation study evaluating the agreement between a commercial bioelectric impedance scale and calipometry are much higher than in the present study. Thus, our results are comparable to other studies using NIA, but better results might be achieved by using alternative methods of body fat measurement (BIA or calipometry).

Bias for heart rate measurement is comparable to other studies, yet, limits of agreement in our study are much larger while ICCs are lower. This might be due to the comparatively large time-interval between the Preventiometer and SHIP measurements and the lack of a resting phase before measurements in the Preventiometer.

Regarding spirometric measurements, estimates of bias and limits of agreement found in Study 2 were at the upper end of the range of what has been found in similar studies. One study also reports ICCs for peak flow measurements that are slightly higher than ICCs obtained in our study [24].

General discussion

Overall, while Preventiometer examinations have adequate reliability according to conventional cut-offs [16], which are in line with results from comparable methods studies (Table 3): Yet, there are some issues to be overcome to increase the comparability of results to the conventional assessment of the studied biomarkers in a cohort study. Measurement agreement was acceptable for most examinations with the exception of blood pressure. The consistently higher blood pressure measurements in the Preventiometer may be dealt with by introducing a larger resting period before, and by repeating measurements. In addition, the limits of agreement for most examinations were large compared to other method comparison studies dealing with similar variables. This likely reflects a relevant influence of random measurement error which is also supported by the fact that variance components of methods were consistently smaller than variance components of residuals in the ICC models, respectively. However, one has also to take into account the natural clinical outcome: For example, systolic blood pressure, diastolic blood pressure, and pulse rate can be expected to have lower agreement than body fat or body weight because the underlying physiological magnitudes and processes are more volatile [58]. Thus, the comparatively low ICCs and large limits of agreement for blood pressure and heart rate may be partly explained by this variability. Another source of disagreement is probably rooted in the methodological and procedural differences described in the discussions of Study 1 and Study 2 (e.g., resting phases, time-intervals). Therefore, a better agreement between blood pressure measurements in Preventiometer and SHIP may be expected, if the procedures were harmonized.

In contrast to blood pressure and heart rate, natural variability may not explain discrepancies with regards to body fat measurements. The body fat measurement device in the Preventiometer only measures body fat values up to 45% whereas the Bod Pod (SHIP) does not have this technical measurement limit. Inspecting the Bland–Altman Plots for the comparisons of body fat measurement, this problem becomes visible in form of the points lying on the decreasing line at the right end of the plot. However, we decided to not exclude these data points since this problem may arise in many application contexts with normal populations (which also include people with body fat percentages above 45%) and thus, this technical measurement limit also impairs the validity.

To improve the comparability of the Preventiometer results, we suggest the following steps: (1) Blood pressure measurement should follow procedures of available guidelines [59], that is at least two successive measurements shall be obtained and a resting pause of 5 min should be implemented before the first measurement. (2) Spirometry should be extended by the inspiratory part of the examination as recommended in relevant guidelines. This has been already implemented in the course of PAKt-MV. (3) The body fat measurement device should be replaced by a more valid device. The actual near-infrared interactance body fat device not only has considerable disagreement with the Bod Pod device from SHIP but it also has a technical measurement limit at 45% (see above). While near-infrared interactance is a very time-efficient measurement method to assess body fat, one should keep in mind that it is usually applied to one body point only, while the more valid and traditional skinfold method is applied to multiple body points and an algorithm is used to compute overall body fat [60]. Therefore – technical limitations notwithstanding, multiple body points might be measured with the near-infrared interactance method, thereby combining the time-efficiency of the near-infrared interactance method with the validity of the skinfold method. However, testing the validity using multiple vs. single measuring points with the near-infrared interactance method, Heyward et al. [61] found only a small advantage using multiple measuring points.

Limitations

Repeated measurements within a single study would have allowed for a variance decomposition and better estimation of the measurement error (a) due to the Preventiometer, (b) due to SHIP, and (c) due to the lack of agreement between Preventiometer and SHIP. However, logistical constraints required that SHIP participants could only be examined once, allowing for no variation of the sequential order of Preventiometer and SHIP examinations in Study 2, and the Preventiometer examinations always took place after the SHIP examinations. We did not cover all potential measurements of the Preventiometer [5, 6] because we focused on measurements comparable to SHIP. Measurement properties are of relevance to provide an informed overview on the usefulness of the Preventiometer for participants and researchers alike. Yet, other aspects beyond the scope of this paper are of relevance as well. The positive user experience [7, 8] has been commented upon. We were also able to perform assessments right at the work place of participants, resulting in little to no travel time for them. Effects on response would need to be dealt with in a separate study. Another aspect is a formal comparison of staffing requirements. When using a bus, there must be a driver with an appropriate license. Overall, compared to stationary examinations, there may be little options to save personnel. On the other hand a very important issue is resolved. All data is collected electronically and stored in a single database. Therefore, background IT-infrastructure is provided, which is important from a provider perspective. In addition, a larger follow-up study is recommended, once the issues raised here have been resolved.

Conclusion

The initial motivation of these studies was to evaluate the Preventiometer for the use in a preventive health care project (PAKt-MV). As previously stated, reliability is a prerequisite for the detection of change within subjects over time. In our current evaluation, we found the Preventiometer’s measurements sufficient in this regard. However, measurement agreement was insufficient for some measurements. While issues like the body fat measurements can be easily remedied by replacing the measurement device, the deviant blood pressure and pulse measures are an indication for a procedural issue. One of the reasons to use the Preventiometer is to save examination time, which benefits the examiners and the participants. To forgo the recommended resting periods for measuring blood pressure and pulse rate can be seen as a trade-off exchanging validity for time. Our findings suggest that insufficient resting periods have a strong biasing impact making a rather conservative point of trade-off to be preferable. Overall, methodological and technological improvements should be realized before using the Preventiometer in population-based research.