Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

Martin, R. Kyle; Wastvedt, Solvejg; Lange, Jeppe; Pareek, Ayoosh; Wolfson, Julian; Lund, Bent

doi:10.1007/s00167-022-07054-8

Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

HIP
Open access
Published: 10 August 2022

Volume 31, pages 2079–2089, (2023)
Cite this article

Download PDF

You have full access to this open access article

Knee Surgery, Sports Traumatology, Arthroscopy Aims and scope

Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

Download PDF

R. Kyle Martin ORCID: orcid.org/0000-0001-9918-0264^1,2,
Solvejg Wastvedt³,
Jeppe Lange^4,5,
Ayoosh Pareek⁶,
Julian Wolfson³ &
…
Bent Lund^4,7

1530 Accesses
Explore all metrics

Abstract

Purpose

Accurate prediction of outcome following hip arthroscopy is challenging and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy.

Methods

Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data were split randomly into training (75%) and test (25%) sets. Four models intended for these types of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry.

Results

In total, 5581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 years (± 2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only pre-operative variables (0.62–0.67), and when considering all variables available in the registry (0.63–0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models.

Conclusion

The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population.

Level of evidence

Level III.

Can machine learning models predict failure of revision total hip arthroplasty?

Article 04 May 2022

Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning

Article Open access 08 January 2019

Development and internal validation of machine learning algorithms for predicting complications after primary total hip arthroplasty

Article 04 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In 2003, Ganz et al. described femoroacetabular im**ement (FAI) as one of the primary causes of hip osteoarthritis [10]. Over the last two decades, hip arthroscopy has been increasingly performed for the treatment of this intra-articular hip disorder along with cartilage and labral injuries [4, 7, 42]. As the annual number of procedures has increased, many studies have sought to evaluate the risk of undergoing a subsequent revision hip arthroscopy [1, 2, 5, 6, 9, 11, 12, 14, 17, 23, 28, 32, 34, 38]. Though these studies have identified several risk factors associated with revision surgery, the ability to translate these pre-operative factors into a specific risk score is poor. A clinical tool to estimate a patient’s individual risk of having subsequent revision hip arthroscopy would be a valuable adjunct for the surgeon to guide discussions regarding surgical decision-making and expectations.

Machine learning has the potential to improve the ability to estimate outcome at an individual level. Machine learning uses data to build flexible prediction and decision-making models without the need for researchers to pre-specify how predictors relate to each other and to the outcome of interest. Through analysis of large clinical datasets, machine learning models can identify factors associated with outcome and use these factors to formulate prospective predictive algorithms. The ideal database for clinically useful machine learning analysis is one that contains a large volume of patient data that is representative of a diverse portion of the population under evaluation. National registries represent a potentially strong data source which hold promise for the development of clinically impactful outcome prediction models due to the large volume of patients from multiple institutions and surgeons.

The Danish Hip Arthroscopy Registry (DHAR) has been prospectively collecting demographic, surgical, and outcome data since 2012. There are currently more than 6000 patients registered in the database who have undergone hip arthroscopy throughout Denmark. This national registry has yielded several clinically useful contributions to the orthopaedic literature [15, 26, 27, 29,30,31], and machine learning enables further analysis. The purpose of this study was to apply machine learning to the DHAR with the primary goal of develo** a clinically useful algorithm capable of predicting subsequent revision hip arthroscopy. The hypothesis was that a resulting algorithm would be able to accurately estimate a patient’s risk of subsequent revision hip arthroscopy based on variables available in the pre-operative clinical setting. If successful, the resulting prediction model could be implemented in the clinic as an online calculator to guide discussions regarding surgical decision-making and outcome expectations at a patient-specific level.

Materials and methods

At the time of data entry in the DHAR, all patients provide informed consent. The DHAR complies with all current national data protection legislature. Data management in the current study was performed confidentially according to Danish and European Union (EU) data protection rules, with all data de-identified prior to retrieval for analysis. As this was a register-based study, ethical approval was automatically waived according to national legislature.

Transparent reporting

This manuscript was written in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement [3]. The TRIPOD statement represents recommendations for studies develo** and/or validating prediction models. The goal of the TRIPOD statement is to improve the transparency of prediction model studies through full and clear information reporting and includes a 22-item checklist.

Data preparation

Patients in the DHAR with primary hip arthroscopy dates between January 2012 and December 2020 were included. A full list of variables used in the analysis is shown Table 1a (pre-operative variables only) and 1b (intraoperative variables). Patients with previous surgery to the same hip were excluded to focus model prediction on patients undergoing primary hip arthroscopy for FAI. Additionally, a small number of patients with a history of Legg Calve Perthes, developmental dysplasia of the hip, avascular necrosis, slipped capital femoral epiphysis, or hip fracture were excluded to limit heterogeneity of the population and focus on surgical management of primary FAI. New variables were defined for type of previous injury to same hip (acetabular dysplasia, FAI), an indicator if the patient was missing any patient reported outcome variable, type of labral repair anchors (bioabsorbable, PEEK, all suture), number of anchors, type of knots, type of cartilage treatment (microfracture, fixation/resection), and type of other pathology found (adhesions, partial/full ligamentum teres rupture, synovitis, bursitis, calcified labrum, os acetabuli, loose bodies, other). The following variables were recoded: MRI performed (non-contrast, arthrogram) and Tönnis grade (Grades 0,1,2,3, and missing). Time to revision was calculated as number of months from primary hip arthroscopy to revision. For assessing concordance at specific follow-up times, patients with a revision at or prior to the time point were considered as having experienced the event.

Table 1 Characteristics of patients

Full size table

Machine learning modelling

The cleaned data were split randomly into training (75%) and test (25%) sets for model fitting and evaluation, respectively. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. This approach utilises a survival-analysis temporal framing structure [25] and the program R (version: 4.1.1, R Core Team 2021, R Foundation for Statistical Computing, Vienna, Austria) was used to fit and evaluate several models adapted for censored, time-to-event data. “Censoring” refers to the fact that at any given time, complete information is not known for all the patients in the registry. For example, if a patient has two years of follow-up after primary surgery with no revision, we do not know if or when that patient will go on to have a revision. Models adapted for censored data allow use of the partial information contained in these censored observations while accounting for the incompleteness.

The following four machine learning models were used: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. The Cox elastic net is a penalised, semi-parametric regression model that selects a subset of the predictors for inclusion in the model. “Elastic net” refers to the combination of L1 and L2 penalties used to shrink model coefficients toward zero [35]. The random survival forest is an adaptation of the popular tree-based random forest method for censored data. It uses all predictors and is nonparametric, meaning it does not require specification of the model structure [16]. The GBM is also tree based and nonparametric. It iteratively improves the model fit using all predictors [8]. The super learner is an “ensemble” technique that averages over model fits from several different types of models for an even more flexible approach [24]. Our super learner combined all the other three model types: Cox elastic net, random survival forest, and GBM.

The Cox elastic net model (package glmnet, alpha value 0.9, lambda value selected via cross-validation) was fit to the data and predictors with non-zero coefficients were retained, shown in the top panel of Fig. 1. The random survival forest, GBM, and super learners were fit using a grid search method to arrive at hyperparameters (package MachineShop). The grid search method compares all possible combinations of a given set of hyperparameters to find the best fit based on a specified performance metric, for which the C-Index was used, as described below. The random survival forest (package randomForestSRC) used 1000 trees, a minimum node size of 200, and 10 variables tried per split. The GBM (package gbm) used 1000 trees, and interaction depth of 3, minimum node size of 100, and shrinkage of 0.01. The super learner model (SuperModel function, package MachineShop) combined the three previous models with the specified hyperparameters.

Each of the machine learning models was fit using two different sets of predictors: all predictors, and all predictors excluding intraoperative variables (Table 1a). The two separate analyses allowed for comparison of model performance given only variables available in the pre-operative setting versus a model considering all variables available after surgical intervention.

Model evaluation

Performance measures adapted for censored data were used to evaluate the four models on survival probabilities calculated for the hold-out test set. A measure of model concordance adapted for censored data, Harrell’s C-Index, was used at 1-, 2-, and 5-year follow-up times. The C-Index computes the proportion of pairs of observations in which predicted survival probability ranking corresponds to actual ranking [13]. It is a generalisation of the common area under the Receiver Operating Characteristics curve (AUC) metric for censored data and, as with AUC, ranges from 0 to 1 with 1 indicating perfect concordance and 0.5 representing random chance. Concordance is a measure of the model’s ability to differentiate between patients who do and do not experience the event. A model is said to have perfect concordance if the predicted risks for all individuals who experience the outcome are higher than those for all those who do not. Most clinically useful prediction models have a concordance in the 0.65–0.8 range [41]. Calibration which was adapted for censored data was also calculated. Calibration measures the accuracy of the predicted probabilities by comparing actual to expected outcomes. For this purpose, a version of the Hosmer–Lemeshow statistic intended for censored data was used. The statistic sums average misclassification in predicted risk quintiles and converts the sum into a chi-squared statistic [37]. Larger values of the calibration statistic indicate worse accuracy and produce smaller p-values. Statistical significance of the calibration statistic means we reject the null hypothesis of perfect calibration. Each of these performance metrics was calculated separately for models trained using the full set of predictors and pre-operative variables only.

Missing data

Because of high rates of missing data (Table 1) on some variables used for prediction, imputation was performed on the cleaned data prior to analysis. The imputation was performed via random forest (function missForest in package missForest) to arrive at a single imputed data set for each of the training and test data. The random forest imputation method trains a random forest on the observed data and uses it to predict imputed values for missing data [36]. To avoid leakage between the training and test data, the forest was trained on only the training observed data and was then used to predict for both training and test sets. All models were fit and evaluated on the imputed training and test sets, respectively. Imputation was performed separately for the two analyses described above (pre-operative only and all variables). In each case, only the predictor variables included in the specified analysis were used in imputation.

Results

Data characteristics

After data cleaning, 5581 patients were included in the analysis (713 patients excluded for previous hip surgery, 16 more patients excluded based on type of previous injury to same hip). Table 1 describes the characteristics of the population at the time of primary hip arthroscopy and lists all predictor variables considered in the analysis. Of the patients included after data cleaning, 603 (11%) underwent revision surgery, during an average follow-up time of 4.25 years (SD 2.51). The population was predominantly female (3079 patients; 55%), the average alpha angle was 67 (SD 14), average Tönnis grade was 0, and the majority had uni-lateral hip pain (3824 patients; 69%). Table 2 describes the number of patients who experiences revision at or before 1, 2, and 5 years post primary surgery as well as the number with complete follow-up but no revision, and the number censored before the follow-up time.

Table 2 Description of censoring

Full size table

Machine learning model performance

The four models exhibited concordance in the moderate range across the follow-up times when restricted to only pre-operative variables (0.62–0.67) and exhibited similar concordance when using all variables (Tables 3, 4). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true concordance of the models. The random survival forest and GBM had a slight edge over the other two models in terms of concordance at 1-, 2-, and 5-year follow-up times using only pre-operative variables. The GBM had the best concordance of the models for the analysis using all variables. In general, the models were well calibrated, with only the random survival forest showing evidence of mis-calibration at 1 year (p value less than 0.01) and slight evidence of mis-calibration at 5 years (p value between 0.01 and 0.05) for the analysis restricted to pre-operative variables. For the analysis using all variables, only the Cox elastic net model showed evidence of mis-calibration at 1 year and slight evidence of mis-calibration at 5 years.

Table 3 Model performance

Full size table

Table 4 Model performance – all variables

Full size table

Factors predicting risk of revision surgery

Variables with non-zero coefficients in the pre-operative variable Cox elastic net model were, in order of importance: sex, pre-operative HAGOS Quality of Life score, pre-operative NRS Activity score, and pre-operative HAGOS Symptoms and Sport scores. The relative importance of these variables for predicting probability of revision surgery is shown in the top panel of Fig. 1, where the size of each bar corresponds to the absolute value of the variable’s effect size. Variables in the top third by importance for the other three pre-operative variable models also included pre-operative HAGOS scores, pre-operative NRS Activity score, and sex (random survival forest and super learner). However, age at surgery was the most important variable for these three models (Fig. 1, bottom three panels). The random survival forest and super learner models use permutation-based variable importance, which measures importance as the relative change in model performance upon randomly permuting values of the given variable. The GBM quantifies importance as difference in error rate were the variable to be removed.

Discussion

The most important finding of this study is that while machine learning analysis of a national hip arthroscopy registry enabled the development of algorithms capable of predicting subsequent revision surgery, the clinical utility of these models is likely limited. Analysis was performed using only variables that would be available in the pre-operative setting and again using the full data set. Both scenarios resulted in well-calibrated models with moderate concordance, but also with wide confidence intervals that approached random chance. Overall, the analysis was limited by a substantial proportion of missing data but encourages optimism for future models if data collection can be improved.

Machine learning represents an approach to health care research that is increasingly being applied to analyse large orthopaedic databases. The main advantage of machine learning relates to the ability of the technique to realise complex associations and relationships within large datasets. With minimal direct human programming, these models can “learn” which factors are associated with a specified outcome and can then create an algorithm with the goal of accurate outcome prediction. The most common machine learning applications in orthopaedic surgery involve clinical prediction modelling and automated image interpretation. It is anticipated that machine learning models will serve as a valuable adjunct for clinicians in the future, guiding clinical discussions at a patient-specific level.

Within the field of hip arthroscopy several studies have now been performed that seek to predict patient-specific outcome following the procedure. Most have focused on patient reported outcome, with Kunze et al. analysing single-surgeon data to predict multiple post-operative endpoints based on different outcome measuring tools [19,20,21,22]. The prediction of subsequent surgery following hip arthroscopy has also been performed by Haeberle et al. based on another single-surgeon database of over 3000 patients [11]. With their study, Haeberle et al. achieved an AUC of 0.77 ± 0.08 for predicting a patient’s risk of subsequent revision hip arthroscopy. These early studies show promise for clinical usefulness of hip arthroscopy prediction models but are of uncertain real-world applicability due to the single-surgeon nature of the databases and lack of external validation.

This study represents the first national registry-based machine learning model for hip arthroscopy outcome prediction. The goal of the present study was to develop an accurate model based on pre-operative variables that could provide a risk estimate for subsequent hip arthroscopy at a patient-specific level. This would allow a surgeon to input their patient’s data into a prediction calculator during the initial patient encounter and estimate that patient’s individual revision surgery risk. This information could then guide expectations and the surgical with the patient. While the results of this study did demonstrate the ability to predict revision surgery with reasonable accuracy, the wide discrimination confidence interval likely limits the clinical utility of the algorithms.

There are some possible explanations for the inferior model performance of the present study relative to the revision hip arthroscopy model developed by Haeberle et al. [11]. Although overall compliance with the DHAR is between 78–97% annually [43], the completeness of the data limits the ability of the models to accurately predict outcome. This is partly due to the fact that as the DHAR evolved from the initial stages through to the present version, some variables were added, removed, or modified which contributes to the data inconsistency. Variance within the DHAR is also expected, given the multiple-surgeon nature of the registry while the single-surgeon institutional registry likely benefits from more overall consistency. As more patients are enrolled and the data collection stability improves, it is anticipated that future machine learning analysis of the DHAR may yield improved prediction accuracy.

The variables recorded in the DHAR itself may also limit the ability of machine learning analysis to develop useful risk prediction models. The multiple factors included in the register were chosen by the founding surgeons as they were felt to be the most relevant based on current literature. It is possible that some factors not currently included in the DHAR may in fact be more strongly associated with outcome and thus, their exclusion may bias the models toward suboptimal performance. Future analysis may clarify this limitation and the advancements of other machine learning techniques such as computer vision [18] and natural language processing [39, 40] may make register-based data collection both simpler and more comprehensive.

Substantial missing data represent the main limitation of this study while there are other limitations to also consider. First, four common machine learning models that represent various approaches to variable selection and model complexity were selected for data analysis, but it is possible that a model that was not considered may have performed better. Second, the analysis included all variables in the DHAR but there may be other factors associated with the risk of subsequent surgery which are not included in the registry and therefore not considered in our models. Some examples of factors that may be relevant for outcome prediction include clinical examination findings, rehabilitation details, or raw imaging data files. The main concern regarding clinical applicability of this study lies in the accuracy of the model, with concordance limited by a wide confidence interval approaching random chance. Additionally, the ability to pre-operatively predict who is at risk of subsequent revision hip arthroscopy is likely limited by the endpoint itself. That is, a common reason that is often cited for revision surgery is residual CAM deformity—a factor that is not known in the pre-operative setting [2, 12, 33, 38].

Although the results from this preliminary study are not suitable for immediate clinical application, it should serve as a baseline for future outcome prediction studies applying machine learning to large hip arthroscopy datasets. Additionally, there is optimism regarding the future development of patient-specific revision risk estimation if data collection can be improved. Accurate prediction of outcome using machine learning relies on both data quantity and quality. As a national registry, the DHAR will naturally continue to grow the quantity of data collected over time as all hip arthroscopy procedures performed in Denmark are captured. Data quality is more challenging to improve upon. Overcoming bias related to the surgeon-selected nature of the variables currently collected by the registry will require ongoing critical assessment over time and emerging technology like natural language processing for data collection may enable the identification of additional variables that may influence outcome. Another way to potentially improve machine learning driven outcome prediction is through the creation of an international hip arthroscopy register or collaboration between national registers. International collaboration would require a pre-determined definition of a minimum common dataset across registers but would greatly improve predictive power through data sharing. Resulting algorithms could then be implemented into clinical practice to guide outcome expectations and discussions around surgical decision-making in the pre-surgical setting.

Conclusion

The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population.

References

Beals TR, Soares RW, Briggs KK, Day HK, Philippon MJ (2022) Ten-year outcomes after hip arthroscopy in patients with femoroacetabular im**ement and borderline dysplasia. Am J Sports Med 50:739–745
Article PubMed Google Scholar
Bogunovic L, Gottlieb M, Pashos G, Baca G, Clohisy JC (2013) Why do hip arthroscopy procedures fail? Clin Orthop Relat Res 471:2523–2529
Article PubMed PubMed Central Google Scholar
Collins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 162:55–63
Article PubMed Google Scholar
Cvetanovich GL, Chalmers PN, Levy DM, Mather RC, Harris JD, Bush-Joseph CA, Nho SJ (2016) Hip arthroscopy surgical volume trends and 30-day postoperative complications. Arthroscopy 32:1286–1292
Article PubMed Google Scholar
Degen RM, McClure JA, Le B, Welk B, Lanting B, Marsh JD (2022) Hip arthroscopy utilization and reoperation rates in Ontario: a population-based analysis comparing different age cohorts. Can J Surg 65:E228–E235
Article PubMed PubMed Central Google Scholar
Degen RM, Pan TJ, Chang B, Mehta N, Chamberlin PD, Ranawat AS, Nawabi DH, Kelly BT, Lyman S (2017) Risk of failure of primary hip arthroscopy-a population-based study. J Hip Preserv Surg 4:214–223
Article PubMed PubMed Central Google Scholar
Disegni E, Martinot P, Dartus J, Migaud H, Putman S, May O, Girard J, Chazard E (2021) Hip arthroscopy in France: an epidemiological study of postoperative care and outcomes involving 3699 patients. Orthop Traumatol Surg Res 107:102767
Article PubMed Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378
Article Google Scholar
Fukase N, Murata Y, Pierpoint LA, Soares RW, Arner JW, Ruzbarsky JJ, Quinn PM, Philippon MJ (2022) Outcomes and survivorship at a median of 8.9 years following hip arthroscopy in adolescents with femoroacetabular im**ement: a matched comparative study with adults. J Bone Joint Surg 104:902–909
Article PubMed Google Scholar
Ganz R, Parvizi J, Beck M, Leunig M, Nötzli H, Siebenrock KA (2003) Femoroacetabular im**ement: a cause for osteoarthritis of the hip. Clin Orthop Relat Res 417:112–120
Article Google Scholar
Haeberle HS, Ramkumar PN, Karnuta JM, Sullivan S, Sink EL, Kelly BT, Ranawat AS, Nwachukwu BU (2021) Predicting the risk of subsequent hip surgery before primary hip arthroscopy for femoroacetabular im**ement syndrome: a machine learning analysis of preoperative risk factors in hip preservation. Am J Sports Med 49:2668–2676
Article PubMed Google Scholar
Haefeli PC, Albers CE, Steppacher SD, Tannast M, Büchler L (2017) What are the risk factors for revision surgery after hip arthroscopy for femoroacetabular im**ement at 7-year followup? Clin Orthop Relat Res 475:1169–1177
Article PubMed Google Scholar
Harrell FE (1982) Evaluating the yield of medical tests. JAMA J Am Med Assoc 247:2543
Article Google Scholar
Huang H-J, Dang H-H, Mamtimin M, Yang G, Zhang X, Wang J-Q (2022) Hip arthroscopy for femoroacetabular im**ement syndrome shows good outcomes and low revision rates, with young age and low postoperative pain score predicting excellent five-year outcomes. Arthroscopy S0749–8063(22):00193–00201
Google Scholar
Ishøi L, Thorborg K, Kraemer O, Lund B, Mygind-Klavsen B, Hölmich P (2019) Demographic and radiographic factors associated with intra-articular hip cartilage injury: a cross-sectional study of 1511 hip arthroscopy procedures. Am J Sports Med 47:2617–2625
Article PubMed Google Scholar
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2:841–860
Article Google Scholar
Kester BS, Capogna B, Mahure SA, Ryan MK, Mollon B, Youm T (2018) Independent risk factors for revision surgery or conversion to total hip arthroplasty after hip arthroscopy: a review of a large statewide database from 2011 to 2012. Arthroscopy 34:464–470
Article PubMed Google Scholar
Ko S, Pareek A, Ro DH, Lu Y, Camp CL, Martin RK, Krych AJ (2022) Artificial intelligence in orthopedics: three strategies for deep learning with orthopedic specific imaging. Knee Surg Sports Traumatol Arthrosc 30:758–761
Article PubMed Google Scholar
Kunze KN, Polce EM, Clapp I, Nwachukwu BU, Chahla J, Nho SJ (2021) Machine learning algorithms predict functional improvement after hip arthroscopy for femoroacetabular im**ement syndrome in athletes. J Bone Jt Surg 103:1055–1062
Article Google Scholar
Kunze KN, Polce EM, Clapp IM, Alter T, Nho SJ (2022) Association between preoperative patient factors and clinically meaningful outcomes after hip arthroscopy for femoroacetabular im**ement syndrome: a machine learning analysis. Am J Sports Med 50(3):746–756
Article PubMed Google Scholar
Kunze KN, Polce EM, Nwachukwu BU, Chahla J, Nho SJ (2021) Development and internal validation of supervised machine learning algorithms for predicting clinically significant functional improvement in a mixed population of primary hip arthroscopy. Arthroscopy 37:1488–1497
Article PubMed Google Scholar
Kunze KN, Polce EM, Rasio J, Nho SJ (2021) Machine learning algorithms predict clinically significant improvements in satisfaction after hip arthroscopy. Arthroscopy 37:1143–1151
Article PubMed Google Scholar
Kuroda Y, Hashimoto S, Saito M, Hayashi S, Nakano N, Matsushita T, Niikura T, Kuroda R, Matsumoto T (2021) Obesity is associated with less favorable outcomes following hip arthroscopic surgery: a systematic review and meta-analysis. Knee Surg Sports Traumatol Arthrosc 29:1483–1493
Article PubMed Google Scholar
van der Laan MJ, Polley EC, Hubbard AE (2007) Super learner. Stat Appl Genet Mol Biol. https://doi.org/10.2202/1544-6115.1309
Article PubMed Google Scholar
Lauritsen SM, Thiesson B, Jørgensen MJ, Riis AH, Espelund US, Weile JB, Lange J (2021) The framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards. NPJ Digit Med 4:158
Article PubMed PubMed Central Google Scholar
Lund B, Mygind-Klavsen B, Grønbech Nielsen T, Maagaard N, Kraemer O, Hölmich P, Winge S, Lind M (2017) Danish hip arthroscopy registry (DHAR): the outcome of patients with femoroacetabular im**ement (FAI). J Hip Preserv Surg 4:170–177
Article PubMed PubMed Central Google Scholar
Lund B, Nielsen TG, Lind M (2017) Cartilage status in FAI patients - results from the Danish Hip Arthroscopy Registry (DHAR). SICOT-J 3:44
Article PubMed PubMed Central Google Scholar
Minkara AA, Westermann RW, Rosneck J, Lynch TS (2019) Systematic review and meta-analysis of outcomes after hip arthroscopy in femoroacetabular im**ement. Am J Sports Med 47:488–500
Article PubMed Google Scholar
Mygind-Klavsen B, Kraemer O, Hölmich P, Lund B (2020) An updated description of more than 5000 procedures from the Danish hip arthroscopy registry. J Bone Joint Surg 102:43–50
Article PubMed Google Scholar
Mygind-Klavsen B, Lund B, Nielsen TG, Maagaard N, Kraemer O, Hölmich P, Winge S, Lind M (2019) Danish hip arthroscopy registry: predictors of outcome in patients with femoroacetabular im**ement (FAI). Knee Surg Sports Traumatol Arthrosc 27:3110–3120
Article PubMed Google Scholar
Mygind-Klavsen B, Nielsen TG, Lund B, Lind M (2021) Clinical outcomes after revision hip arthroscopy in patients with femoroacetabular im**ement syndrome (FAIS) are inferior compared to primary procedures. Results from the Danish Hip Arthroscopy Registry (DHAR). Knee Surg Sports Traumatol Arthrosc 29:1340–1348
Article PubMed Google Scholar
Philippon MJ, Ryan M, Martin MB, Huard J (2022) Capsulolabral adhesions after hip arthroscopy for the treatment of femoroacetabular im**ement: strategies during rehabilitation and return to sport to reduce the risk of revision. Arthrosc Sports Med Rehabil 4:e255–e262
Article PubMed PubMed Central Google Scholar
Philippon MJ, Schenker ML, Briggs KK, Kuppersmith DA, Maxwell RB, Stubbs AJ (2007) Revision hip arthroscopy. Am J Sports Med 35:1918–1921
Article PubMed Google Scholar
Shah A, Kay J, Memon M, Simunovic N, Uchida S, Bonin N, Ayeni OR (2020) Clinical and radiographic predictors of failed hip arthroscopy in the management of dysplasia: a systematic review and proposal for classification. Knee Surg Sports Traumatol Arthrosc 28:1296–1310
Article PubMed Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J Stat Softw 39(5):1–13. https://doi.org/10.18637/jss.v039.i05
Stekhoven DJ, Bühlmann P (2012) MissForest–non-parametric missing value imputation for mixed-type data. Bioinforma Oxf Engl 28:112–118
Article CAS Google Scholar
Vock DM, Wolfson J, Bandyopadhyay S, Adomavicius G, Johnson PE, Vazquez-Benitez G, O’Connor PJ (2016) Adapting machine learning techniques to censored time-to-event health record data: a general-purpose approach using inverse probability of censoring weighting. J Biomed Inform 61:119–131
Article PubMed PubMed Central Google Scholar
West CR, Bedard NA, Duchman KR, Westermann RW, Callaghan JJ (2019) Rates and risk factors for revision hip arthroscopy. Iowa Orthop J 39:95–99
PubMed PubMed Central Google Scholar
Wyatt JM, Booth GJ, Goldman AH (2021) Natural language processing and its use in orthopaedic research. Curr Rev Musculoskelet Med 14:392–396
Article PubMed PubMed Central Google Scholar
Wyles CC, Tibbo ME, Fu S, Wang Y, Sohn S, Kremers WK, Berry DJ, Lewallen DG, Maradit-Kremers H (2019) Use of natural language processing algorithms to identify common data elements in operative notes for total hip arthroplasty. J Bone Joint Surg 101:1931–1938
Article PubMed Google Scholar
Youngstrom EA (2014) A primer on receiver operating characteristic analysis and diagnostic efficiency statistics for pediatric psychology: we are ready to ROC. J Pediatr Psychol 39:204–221
Article PubMed Google Scholar
Zusmanovich M, Haselman W, Serrano B, Banffy M (2022) The incidence of hip arthroscopy in patients with femoroacetabular im**ement syndrome and labral pathology increased by 85% between 2011 and 2018 in the United States. Arthroscopy 38:82–87
Article PubMed Google Scholar
(2019) The Danish Hip Arthroscopy Registry - Annual Report 2018. Annual Report, Denmark, p 20

Download references

Funding

This study was funded by a Norwegian Centennial Chair Seed Grant.

Author information

Authors and Affiliations

Department of Orthopaedic Surgery, University of Minnesota, 2512 South 7th Street, Suite R200, Minneapolis, MN, 55455, USA
R. Kyle Martin
Department of Orthopaedic Surgery, CentraCare, Saint Cloud, MN, USA
R. Kyle Martin
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
Solvejg Wastvedt & Julian Wolfson
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
Jeppe Lange & Bent Lund
CAAIR, Horsens Regional Hospital, Horsens, Denmark
Jeppe Lange
Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
Ayoosh Pareek
Department of Orthopedic Surgery, H-HiP, Horsens Regional Hospital, Horsens, Denmark
Bent Lund

Authors

R. Kyle Martin
View author publications
You can also search for this author in PubMed Google Scholar
Solvejg Wastvedt
View author publications
You can also search for this author in PubMed Google Scholar
Jeppe Lange
View author publications
You can also search for this author in PubMed Google Scholar
Ayoosh Pareek
View author publications
You can also search for this author in PubMed Google Scholar
Julian Wolfson
View author publications
You can also search for this author in PubMed Google Scholar
Bent Lund
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Kyle Martin.

Ethics declarations

Conflict of interest

None.

Ethical approval

Ethical review was waived as consent was obtained by all patients at time of enrolment in the national hip arthroscopy register.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Study was performed at the University of Minnesota, Minneapolis, MN, USA.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Martin, R.K., Wastvedt, S., Lange, J. et al. Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry. Knee Surg Sports Traumatol Arthrosc 31, 2079–2089 (2023). https://doi.org/10.1007/s00167-022-07054-8

Download citation

Received: 19 May 2022
Accepted: 10 June 2022
Published: 10 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00167-022-07054-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry