INTRODUCTION

The world’s population of individuals aged  > 60 years will almost double between 2015 and 2050.1 As the population ages, functional disability becomes more prevalent. In the USA, 41.7% of people aged  ≥ 65 years reported having one or more disabilities.2 Functional disability is associated with adverse outcomes such as decreased quality of life and increased risks of hospitalization and mortality.3 However, functional declines in the aging process are dynamic and reversible. A meta-analysis reported that 13.7% of older adults improved their frailty status during the mean follow-up of 3.9 years.4 Therefore, it is important to identify older adults at high risk of functional disability and to take preventive measures for them at an early stage.

Several attempts have been made to predict functional disability and other functional statuses in older populations. A recent review identified 43 studies that predicted the functional status of community-dwelling older adults.5 In Japan, the Ministry of Health, Labour and Welfare developed a basic function checklist (the Kihon Checklist [KCL] in Japanese) comprising 25 items to identify older adults at high risk of needing long-term care. Tsuji and colleagues developed a risk score comprising ten items to predict functional disability in 3 years using data from the Japan Gerontological Evaluation Study (JAGES).6 Despite existing literature, there are two major research gaps. First, the variables in the developed models were selected based on expert knowledge and previous literature. Researchers can handle a limited number of potential variables and may overlook essential variables. Emerging machine learning algorithms are effective in selecting variables from many candidates without relying on a priori hypotheses or assumptions and may improve the performance of prediction models. However, none of the aforementioned studies has used these methods. Second, most of the previous studies had short follow-up periods, and only four studies from European countries followed participants for over 5 years.5 Because preventive measures for functional disability often need time to elicit effects, a model that can predict the distant future is necessary.

To fill these research gaps, the present study constructed prediction models of functional disability from 183 candidate predictors using machine learning algorithms. We studied functionally and cognitively independent Japanese older adults and followed them to evaluate the performance of the prediction models. To the best of our knowledge, no study has predicted functional disability using machine learning algorithms over 5 years among community-dwelling older adults.

METHODS

Baseline Survey

We used data from the JAGES, which studies Japanese people aged  ≥ 65 years who are not certified as needing long-term care. Self-report questionnaires were mailed to 112,705 residents in 19 municipalities across nine prefectures from October to December 2013. In ten large municipalities, participants were randomly sampled, whereas in other smaller municipalities, a census of all eligible residents was conducted. Of the invited individuals, 79,291 responded, with a response rate of 70.4%. The analysis did not include 4994 respondents, whose sex and age could not be verified. All participants provided informed consent, and the study protocol was reviewed and approved by the ethics committees of Kyoto University (R3153-2) and Nihon Fukushi University (13–14).

Functional Disability

The onset of functional disability was defined as the new certification of needing long-term care and ascertained by linking participants to the public registries of long-term care insurance administrated by each municipality. This definition of functional disability has been widely used in previous studies.6,7,8,9,10 All Japanese citizens aged  ≥ 40 years sign up for public long-term care insurance, and they are eligible for benefits if they are determined to need care.11 Through a nationally standardized protocol, applicants are classified into the following eight levels of needing long-term care: not certified, support-needs levels 1–2, and care-needs levels 1–5 (larger numbers indicate severer disability; see Supplementary Table 1 for more details).12,13 The levels are determined according to a time estimation needed for care based on home-visit and computer-based assessments, a primary physician’s documented opinion, and a committee deliberation. In this study, those certified as one of the seven levels of needing care (except for those not certified) were considered to have functional disabilities. The follow-up period started between October and November 2013 and ended between March 2019 and March 2021 (mean follow-up, 5.4 years). Of the 74,297 eligible respondents, 73,262 participants were successfully linked to the administrative records (follow-up rate = 98.6%). Figure 1 shows a flowchart of the analytic sample.

Figure 1
figure 1

Flowchart of the analytic sample.

Candidate Predictors

We considered all variables constructed by questions that the JAGES asked all participants to be included in the prediction models. A total of 183 variables included demographic characteristics, socioeconomic status, self-reported physical and mental health, health behaviors, social capital, and community environment (see Supplementary Table 2 for the list of candidate variables). To make variables measured using different scales comparable in machine learning algorithms, they were normalized to values ranging from zero to one.

Statistical Analysis

In general, parametric methods overweigh non-parametric methods when the relationship between an outcome and a predictor is linear, and vice versa.14 Thus, we examined the predictive performance of one parametric and three non-parametric machine learning algorithms: namely, ridge regression, gradient boosting, random forest, and eXtreme Gradient Boosting (XGBoost). They can be easily implemented using statistical packages and have been widely used.14,15 We performed logistic regression with ridge regularization to prevent overfitting by penalizing large coefficients.16,17 Gradient boosting18 and random forest19 are non-parametric ensemble methods that combine multiple decision trees to prevent overfitting. Whereas gradient boosting combines decision trees using boosting (iteratively correcting errors made by the previously trained tree), random forest uses bagging (bootstrap aggregating of independently developed trees). XGBoost is a new and fast algorithm of gradient boosting combining regularization.20

For all models, we performed a threefold cross-validation procedure; the dataset was randomly split into three groups; in the three training and validation processes, each group was always used once as test data, while the remaining groups were used as training data. Then, we repeated the same process ten times. The feature importance of selected predictors was calculated; it represents coefficients in ridge regression, while it represents relative values of reductions in the Gini index due to splits over a given predictor in other tree-based algorithms. Based on the feature importance of ridge regression, we developed a simplified risk score for functional disability to facilitate the implementation of the model (it is hard for other non-parametric models to simplify the calculation of risk scores due to non-linearity). We selected the ten most important features in the ridge regression and assigned a score of 1 to the tenth feature. Then, scores proportional to the importance were assigned to each feature (decimals were rounded off). In the dataset, 4.9% of the values were missing. To reduce the potential bias due to missing variables, we imputed them using a random forest algorithm.21 All analyses were performed using Python 3 (CreateSpace, Scotts Valley, CA, USA).

RESULTS

Table 1 presents the baseline characteristics of the participants. For categorical variables, high scores indicate poor outcomes. Among the participants, 16,361 (22.3%) were newly disabled (i.e., needing long-term care) during the study period. Compared to those who remained independent, disabled people were more than 6 years older, lived with fewer household members, had lower household income, were more likely to receive public assistance, to provide self-reporting of needs for assistance in basic activities of daily living (e.g., walking, bathing, and using a toilet), to experience falls within 1 year, worry about falling, and feel bothersome, and to be diagnosed with dementia, Parkinson’s disease, and blood and immune diseases, and rated their health as poorer at baseline. In addition, disabled people were less likely to be able to climb stairs and stand up without support, engage in moderate physical activity (e.g., walking at a brisk pace, dancing, gymnastics, golf, farming, gardening, and car washing), and drive than those who remained independent. Supplementary Fig. 1 describes the distribution of certified levels of needing long-term care in the follow-up.

Table 1 Participants’ Characteristics

Table 2 compares the performance of the proposed prediction models. Among the models, ridge regression showed the best performance in predicting functional disability (C statistics = 0.818), whereas gradient boosting showed a similar performance (0.817). Figure 2 shows the ten most important features of the two models. In both models, we identified age, self-rated health, variables related to falls and posture stabilization, and diagnoses of Parkinson’s disease and dementia as important features. In the ridge regression, household characteristics such as the number of members, income, and receiving public assistance were also important features (Fig. 2A). In the gradient boosting model, moderate physical activity and driving also predicted functional disability (Fig. 2B).

Table 2 Prediction Performance for Functional Disability
Figure 2
figure 2

Ten important features in the prediction models. BADL, basic activities of daily living; PA, physical activity. Household income is equalized by dividing by the square root of the number of household members. Feature importance represents absolute coefficients in the ridge regression, while it represents relative values of reductions in the Gini index due to splits over a given predictor in the gradient boosting.

Table 3 presents the simplified risk score for functional disability based on our ridge regression model. Figure 3 indicates the distribution of the risk score and the percentage of those who experienced the event. The continuous risk score indicated good performance (C statistics = 0.792). Youden index suggests that the cut-off of 6/7 points is optimal (sensitivity = 0.746, specificity = 0.699); those with a score of 7 or higher are at high risk of functional disability.

Table 3 Simplified Risk Score for Functional Disability Based on Ridge Regression
Figure 3
figure 3

Distribution of the risk score and the percentage of the disabled.

We performed several sensitivity analyses. First, we excluded participants who were certified as needing long-term care within 1 year from the baseline survey. The C statistics declined in all models but still showed good performance (0.809 for ridge regression and 0.807 for gradient boosting; Supplementary Table 3). Second, we tested prediction performance for the onset of severe disabilities (i.e., certified as the care-needs level 2 or severer, which requires care for basic activities of daily living), as a previous study defined.22 Compared to the performance for any certified needs levels, that of predicting severer conditions was lower but still good (0.805 for ridge regression and 0.804 for gradient boosting; Supplementary Table 4). Similar to our main models, both prediction models for the alternative cut-off identified age, self-rated health, and diagnoses of Parkinson’s disease and dementia as important features (Supplementary Fig. 2). In the alternative ridge regression, the use of an electric wheelchair and body mass index appeared to be important features (Supplementary Fig. 2A). In the alternative gradient boosting model, several variables related to instrumental activities of daily living (e.g., going shop** and filling out documents) were selected (Supplementary Fig. 2B). Third, we also performed a Cox proportional hazard regression. During the study period, some experienced the onset of functional disability early, others experienced late, and others were censored without the onset of functional disability. However, our main models predicted whether the participant experienced the onset of functional disability, regardless of the duration of free from it. Thus, a prediction model accounting for the time to event may better perform. Our Cox model included a penalty term using ridge regularization to prevent overfitting.14 The Cox model performed similarly to the logistic regression with ridge regularization and gradient boosting (0.817; Supplementary Table 5). Fourth, we confirmed whether a voting ensemble method combining the four algorithms improved performance. However, the performance improvement was slight (0.819; Supplementary Table 5). Finally, we tested the performance of the 25-item KCL, which is often used as a screening tool for those at high risk of functional disability in Japan. Although its performance was acceptable (0.716 for ridge regression and 0.717 for gradient boosting; Supplementary Table 6), our machine learning–based models performed better.

DISCUSSION

This study constructed prediction models of functional disability using machine learning algorithms over 5 years among community-dwelling older adults. Among the models, ridge regression and gradient boosting effectively predicted functional disability. Machine learning improved prediction performance compared to models previously developed. The existing models not based on machine learning indicated median C statistics ranging between 0.65 and 0.76 for development models, and between 0.60 and 0.68 for validation models.5 While the 3-year prediction model developed by Tsuji and colleagues indicated a C statistic of 0.804,6 our model performed better with longer-term forecasts. Although the KCL (excluding five items related to depression) showed good performance in predicting functional disability in 1 year (C statistic = 0.83),23 our additional analysis suggested that its performance degrades when forecasted for more than 5 years. The simplified risk score based on our ridge regression also indicated good performance. Our findings suggest that machine learning enables us to identify those at high risk for functional disability more precisely and to take preventive measures effectively.

Several important features were identified in both models. Both models identified variables related to falls and posture stabilization as important features, namely, the frequency of falls within 1 year, worry about falls, and ability to climb stairs and stand up without support. Moreover, the models have captured the process of functional declines due to aging. People with frailty have difficulty climbing stairs and standing up on their own, and are more likely to fall.3 Falls and traumatic injuries increase the risk of functional disability.24 These four variables are also included in the KCL used in Japan’s long-term care insurance and the risk score of functional disability developed by Tsuji and colleagues.6,23 In line with the previous models, this study suggested that adding these measures could improve the prediction performance of functional disability. In addition, our models suggest that diagnoses of Parkinson’s disease and dementia are important predictors of functional disability as previous studies have found.25,26 These neurodegenerative diseases are common in the older population and result in functional impairments.27,28

We also found that self-rated health predicted functional disability, which was consistent with previous studies.29,30,31 Idler and Benyamini argued that there are four reasons why self-rated health can predict functional disability effectively; (1) it is more inclusive than other measures; (2) it can evaluate not only the current health status but also trajectory; (3) it affects behaviors that have an impact on future health status; and (4) it reflects about resources which one can access when health declines.32 Our findings suggest that self-rated health is a simple and useful measure to predict functional disability among older adults.

Furthermore, ridge regression and gradient boosting models have identified unique and important features. In the ridge regression, the household characteristics such as the number of members, income, and receiving public assistance were selected as important predictors. A previous study reported that the size of social networks, including family members, was not associated with functional disability.33 In contrast, the present study suggested that household size mattered, and those who experienced functional disability had a smaller household size than those who did not. Household income and the status of public assistance may reflect the socioeconomic gradient of functional disability, as previous literature showed.34,35,36

In the gradient boosting model, moderate physical activity and driving were identified as important features. Interestingly, moderate physical activity was the best predictor, although vigorous (e.g., running, swimming, cycling, tennis, exercise at the gym, and mountain climbing) and light (e.g., stretching, bowling, walking to shops or the station, and laundry) physical activities were candidate predictors in the model. Additionally, we found that in older adults, driving out of the house is a good predictor of disability. In order to prevent motor-vehicle collisions by older drivers, the Japanese National Police Agency requires drivers aged  ≥ 75 years to take a special lecture, a cognitive function test, and a driving skills test when renewing their driver’s license as well as incentivize the voluntary return of license. Given such stringent measures in Japan, driving may be a proxy variable for the retention of physical function.

There are several limitations in this study. First, objectively measured variables could not be included as candidates. Prospective studies have shown that objective measures of physical function such as gait speed, one-leg-standing time, and handgrip strength can improve the prediction of functional disability.37,38 Although there may be room to improve prediction performance by adding objectively measured variables, this study showed that prediction models constructed only with self-reported variables could predict functional disability with good performance. Second, this study did not provide causal models, but prediction models; thus, causality should not be inferred from our findings. There can be reverse causation and other potential biases between the identified predictors and functional disability. Readers should not interpret the results for etiology but use them to calculate the risk score of functional disability.39 Further studies are required to confirm causality, and to propose preventive measures for functional disability. Third, some residents did not respond to the survey, which could have caused a selection bias. We could not assess the impact of non-respondents, because we did not have this data. However, a response rate of  > 70% is comparable to or even higher than that of similar surveys involving community-dwelling older adults.40 Fourth, given that we aimed at predicting functional disability for individuals, clinical and biological factors were chosen as important features. However, contextual factors should also be considered for community health. Previous studies demonstrated that living in a community with active social participation and rich social cohesion was associated with the reduced onset of functional disability.7,8,9,22 Fifth, we combined all levels of needing long-term care as the outcome to predict the onset of functional disability. However, the clinical conditions of a person certified as the support-needs level 1 and a person certified as the care-needs level 5 are very different. We performed sensitivity analysis setting the care-needs level 2 as an alternative cut-off and found that the alternative models selected many of the same variables, but some were different from our primary models. We acknowledge that other prediction models may perform better to predict functional disability defined by different cut-offs and the severity of functional disability. Finally, we studied Japanese older adults, and the generalizability of our findings to other countries may be limited.

In summary, we present prediction models for functional disability that included important features selected from 183 candidate predictors using machine learning algorithms. The models showed effective performance prediction over 5 years. Our findings suggest that measuring and adding the variables identified as important features of ridge regression and gradient boosting can improve the prediction of functional disability. This study provides researchers and policymakers with valuable insights for improving the prediction of functional disabilities in community-dwelling older adults.