Introduction

Medullary thyroid carcinoma (MTC) is an uncommon type of thyroid cancer [1]. Therefore, evidence for the management of MTC patients has been inadequate. The overall prognosis of MTC is poorer than follicular cell-derived thyroid carcinomas [1]. MTC is generally divided into familial and sporadic types [2]. There were several studies that demonstrated MEN2B patients had more advanced disease at presentation and lower cause-specific survival (CSS) than MEN2A [3, 4]. Additionally, patients with RET proto-oncogene mutation at codon 918 (M918T) were associated with poor clinical outcomes [5]. Older age, male gender, larger tumor size, extrathyroidal extension (ETE), vascular invasion, and lymph node metastasis are risk factors for more progressive disease [6, 7]. However, the way to evaluate the metastatic risk based on basic clinical parameters has not been solidified. Currently, a histopathological grading system specific for MTC was established, using basic histopathological features [8]. This grading system provided solid evidence of the risk stratification of MTC patients. However, other clinical features such as age, gross ETE, and lymph node metastasis can still be independent prognostic factors to the grading system. A more comprehensive evaluation can be achieved by combining these factors with histopathology grading.

An important factor in current treatment guidelines for MTC is the presence of MTC distant metastasis (DM) which is an indication for systemic therapies [9]. The tumor-node-metastasis (TNM) classification and American Joint Committee on Cancer (AJCC) staging are the only well-established prognostic systems for MTC patients [10], although TNM classification requires information about metastasis rather than estimating the metastatic tendency. Moreover, the system lacks specification in stratifying MTC, compared to other histologic types of thyroid cancers. TNM categories are mainly purposed to stratify patients with differentiated thyroid carcinoma, which are much more common [1]. Some studies showed that TNM classification and the AJCC staging system of MTC should be re-evaluated [11, 12]. For these reasons, there have been many studies to elucidate the prognosis of MTC [6, 12, 13]. However, most of these studies focused on the survival patterns of MTC patients, whereas MTC metastatic risk has not been well studied so far.

In the present paper, we conducted a study on a large population, utilizing the Surveillance, Epidemiology, and End Results (SEER) database. With a large patient database and many analyses, we developed a statistical tool that provides useful information about the metastatic risk of MTC and assists with clinical decision-making.

Method and Materials

We accessed the SEER database registries 18 during the period 2010–2015 and selected patients with the diagnosis of MTC. To screen for the predictive variables of DM, we extracted the following variables: gender, age, race, tumor size, lymph node metastasis, MTC histologic subtypes, AJCC stage, surgery, chemotherapy, radiation, overall survival time, and vital status. The information about vascular invasion is not clearly reported in the database. Therefore, we were unable to extract this variable. ETE can be divided into two categories: minimal ETE and gross ETE. Minimal ETE is defined as tumor invasion into the sternothyroid muscle or perithyroidal soft tissue [14, 15] while gross ETE is defined as invasion at least to the strap muscles (sternohyoid, sternothyroid, thyrohyoid, or omohyoid muscles) [16]. There are some overlap** situations between minimal ETE and gross ETE. Therefore, we included both minimal ETE and gross ETE into one category and defined ETE as (1) extension into perithyroidal soft tissue/connective tissue, or extension into (2) strap muscles and skeletal muscles, (3) parathyroid or nerves (recurrent laryngeal and vagus nerve), or (4) surrounding cartilage, trachea, esophagus, and vascular structures (jugular vein, carotid artery, and thyroid arteries and veins). If the record did not meet these characteristics, ETE would be categorized as “no/unknown.” Patients with unavailable information about age, gender, race, lymph node or distant metastasis, and survival status were excluded.

Next, we divided our dataset into a learning cohort and a test cohort with a 4:1 learning:test ratio of patients. In the learning cohort, we utilized the Bayesian model averaging procedure to select the most informative variables for model development. Next, we performed 1000 random data splicings in the learning cohort. Each data splicing resulted in a 5:3 training to regularization cohort with a 5:3 ratio. For each data splicing, a logistic regression model based on the training cohort was fitted. We used calibration performance as the criteria for model selection [17]. We computed the mean absolute error and 90th percentile absolute error between the model-estimated risk of metastasis and the bias-corrected risk of metastasis by the Efron-Gong optimism bootstrap [18] for the training cohort (MAE1, AE1-90) and the regularization cohort (MAE2, AE2-90). These errors inversely reflect how well the models are calibrated. We combined all types of error into a single total calibration error score (CES). Each of 1000 trained models for each data splicing was evaluated by the CES:

$$\begin{aligned}\mathrm{Calibration\;error\;score\;(CES)} & =\mathrm{(MAE1)^{2}}+\mathrm{(AE1-90)^{2}}\\ &+\mathrm{(MAE2)^{2}+(AE2-90)^{2}}\end{aligned}$$

We selected and evaluated the model with the lowest CES and examined its performance in the test cohort. We performed receiver operating characteristics (ROC) analyses and constructed calibration plots in the three cohorts to assess the model performance. Next, we created a model-based nomogram and a risk scoring table. To stratify the risk of metastasis, we calculated a model-based linear predictor (LP) for each patient in the training cohort and performed K-mean clustering to identify clusters of patients with different risks. Total scores defining risk groups were assigned based on such clusters. Finally, to verify the validity of the risk table, we created a calibration table showing the observed probability of metastasis in each risk group and performed Kaplan–Meier analyses to compare survivorship among risk groups in the three cohorts.

All the analyses were performed using R software 4.1.1 (The R Foundation, Vienna, Austria). The chi-squared tests and the Wilcoxon significant tests were used to compare variables between cohorts. Hypothesis tests were considered significant when p < 0.05.

Results

Patient Selection and Data Splicing

We extracted a total number of 3612 MTC patients from the database and then excluded patients with inadequate information on race (n = 9), lymph node metastasis (n = 121), distant metastasis (n = 1040), tumor size (n = 821), and survival data (n = 9). The remaining cohort (n = 2526) that fit the criteria of our study were divided into a learning cohort (N = 2019) and a test cohort (N = 507). Table 1 summarizes the patient characteristics and variables. Overall, there were no significant differences between the two cohorts. The metastatic probabilities were 7.0% and 6.5% in the learning and test cohorts, respectively.

Table 1 Patient characteristics in the learning and test cohorts

Variable Selection

First, we included gender, age, race, tumor size, ETE, and lymph node metastasis as the DM predictive variables since they are related to DM. AJCC stages cannot be included in the DM-predicting model because DM itself is required for the AJCC staging. We also included MTC histologic subtypes (MTC with amyloid stroma versus MTC, NOS) in the analysis. Since previous studies showed that desmoplastic stroma reaction, which is associated with amyloid deposition, is correlated to local progression [19, 20], we cannot exclude the possibility of amyloid deposition being a predictive variable and, therefore, a confounder of our model.

We performed Bayesian model averaging to select informative variables to estimate the risk of distant metastasis. Age, gender, tumor size, ETE, and lymph node metastasis were selected. In addition, we also performed univariate and multivariate analyses to evaluate the effect of each variable on the risk of metastasis in the learning cohort (Table 2). The strongest risk factor of MTC distant metastasis was lymph node metastasis (OR = 4.25; 95% confidence interval [95% CI] = 2.62–6.9; p < 0.001), followed by ETE (OR = 2.0; 95% CI = 1.34–2.98; p < 0.001), tumor size (OR = 1.8; 95% CI = 1.51–2.16; p < 0.001), age (OR = 1.57; 95% CI = 1.18–2.1; p < 0.001), and male gender (OR = 1.48; 95% CI = 1.01–2.17; p = 0.046). Given that multifocality can be inaccurate if the patients did not undergo surgery due to lack of histologic confirmation, we performed an additional analysis in surgery-treated patients, which is shown in Supplementary Table 1. The analyses yielded relatively similar results as the results in Table 2.

Table 2 Univariate and multivariate analysis of metastatic risk in the learning cohort

Model Selection

In the learning cohort, we conducted 1000 random data splicings resulting in 1000 training cohorts and 1000 corresponding regularization cohorts. CSEs of 1000 models were computed (Supplement Fig. 1). Of these, the model with the lowest CSE (0.001), along with its corresponding training (n = 1264) and regularization cohorts (n = 755), was employed for further evaluation and risk stratification.

Fig. 1
figure 1

Receiver operating characteristic (ROC) analyses of the model showing high areas under the curve (AUCs) in the training (a), regularization (b), and test (c) cohorts, which demonstrate the model’s high performance for estimating the presence of metastasis. The calibration plots of the optimal model in the training (d), regularization (e), and test (f) cohorts showing good calibration when the actual and model-estimated values were < 25%

Evaluation of the Optimal Model

To evaluate the model, we performed ROC analyses of the model on each cohort. The areas under the curve (AUCs) of the model in the training cohort, regularization cohort, and test cohort were 0.81 (95%CI = 0.78–0.85), 0.84 (95%CI = 0.78–0.89), and 0.84 (95%CI = 0.79–0.9), respectively (Fig. 1a-c). Next, we created the calibration plots of the model in each cohort (Fig. 1d-f) to compare 200 bootstrap-estimating probabilities with the actual probability. Overall, the model is well calibrated when the probability is < 25% in all cohorts.

Nomogram and Risk Table Construction

To visualize the model’s interpretation, we constructed a partial nomogram to obtain the estimated values of metastatic risk (Fig. 2). Each variable is scored based on its estimating power in the model. We then introduced a risk table of score assignments using the nomogram score for each variable (Table 3). To stratify the metastatic risk, we performed the K-mean clustering analysis to determine clusters of patients with different metastatic tendencies. The elbow method showed that the optimal K values ranged from 2 to 4 (Supplementary Fig. 2a). We selected K = 3 and calculated the LP range in each cluster (Supplementary Fig. 2b, c). Based on the LP range, patients with a total score 0–13, from > 13 to < 22, and ≥ 22 were stratified as low risk, intermediate risk, and high risk, respectively.

Fig. 2
figure 2

Model-based partial nomogram to predict the probability of distant metastasis. Each variable’s score is located on the single-variable scores line (uppermost line) directly above the variable’s result. After adding all single-variable scores for a total score and locating this number on the total scores line, the linear predictor and probability of distant metastasis are indicated on the corresponding lines directly below total scores

Table 3 Risk stratification table of metastatic risk in medullary thyroid carcinoma

Evaluation of the Risk Table

We calculated observed metastatic risk probability and the bootstrap-estimating 95%CI in each risk group from each cohort in the calibration table (Table 4). Low-risk groups had the 0.88%, 1.3%, and 0.5% observed risk of metastasis in the training, regularization, and test cohorts, respectively. In contrast, the high-risk groups had a substantial propensity for metastasis with 19.7%, 15.8%, and 17.8% respectively in the three cohorts.

Table 4 Risk of metastasis in risk groups

To further examine the validity of the risk table, we performed Kaplan–Meier analysis to compare the survivorship in each risk group (Fig. 3). The risk groups showed a good survival stratification in the total MTC population, surgery-treated patients, and radiation-treated patients (p < 0.001). The analysis in chemotherapy-treated patients (n = 112) did not yield a significant result (p = 0.59). Most of the chemotherapy-treated patients were in the high-risk group (n = 85), followed by intermediate (n = 21), and low risk (n = 6). In the entire studied cohort, 5-year survival rates were 93.6% (95%CI = 92–95.2%), 84.3% (95%CI = 81.1–87.5%), and 70.8% (95%CI = 67.2–74.5%) in the low-, intermediate-, and high-risk group, respectively (p < 0.001). In patients who were not surgically treated, the 5-year survival rates were 54% (95%CI = 36.3–80.3%), 23% (95%CI = 10.7–49.5%), and 19.3% (95%CI = 9.6–39%) in the low-risk (n = 24), intermediate-risk (n = 34), and high-risk (n = 51) patients, respectively (p = 0.11).

Fig. 3
figure 3

Kaplan–Meier curves illustrating the survival difference among risk groups for all patients, surgery-treated patients, radiation-treated patients, and chemotherapy-treated patients. The 6th and 7th AJCC survival curves illustrating the survival difference among patients with different 6th and 7.th AJCC stages, respectively

To compare the survival discrimination of our table with AJCC staging systems in terms of prognosis, we also performed a Kaplan–Meier analysis between the stages reported by AJCC 6th and 7th editions (Fig. 3). Since AJCC 8th edition was reported from 2015, which is unavailable in our cohort, we were unable to compare this edition’s staging system with our risk table. Although the log-rank tests on survivorship of AJCC stages were significant in the Kaplan–Meier analyses, these results originated from the discriminatively poor survivorship of AJCC stage IV patients. To illustrate the poor discrimination, we performed the log-rank tests on AJCC stages I, II, and III, without stage IV. The results of log-rank tests on AJCC stages I, II, and III patients (not stage IV) were insignificant (p = 0.5 and p = 0.5 for 6th AJCC and 7th AJCC, respectively).

Comparative Analysis

The International MTC Grading System (IMTCGS) is a recent well-established method to stratify MTC patients with different risks [8]. The system only uses histologic factors, including mitosis, tumor necrosis, and the Ki67 proliferation index. Table 5 compares our risk stratification table with IMTCGS. First, the IMTCGS-adjusted multivariate cox analysis (results from the previous study) showed that age is still an IMTCGS-independent factor. Although tumor size and microscopic ETE are not prognostic factors, we noted that AJCC stages are included in the multivariate analysis, which can reduce the prognostic significance of tumor size and ETE since AJCC stages are obvious confounders between these characteristics. Another issue is that macroscopic ETE is not included in the analysis, and it is impossible to observe the survival impact of this factor. Next, we compared the overall survival rate at 3, 5, and 10 years of follow-up. Overall, low-risk and high-risk patients in our risk table have prognoses similar to low-grade and high-grade patients categorized by IMTCGS. The intermediate-risk patients remain a significantly distinct group of 3-year, 5-year, and 10-year prognoses. Finally, we compared metastatic risks stratified by our risk table and IMTCGS. Because our table focuses on metastasis at presentation, this is not a direct way to compare the two tables. The IMTCGS low-grade and high-grade patients had 2% and 27% of 3-year metastatic risk while our low-risk and high-risk patients had 1% and 20% of metastatic risk.

Table 5 Comparison of study’s risk table (Table 3) and IMTCGS

Discussion

The present study utilized a large number of patients to yield meaningful statistical inference about the effect size of each variable in quantitatively estimating metastatic risk. We created a risk table of distant metastasis that can provide a useful means of information analysis for oncologists when screening for the presence of metastatic foci. In the current NCCN guidelines, the mainstay of MTC management is total thyroidectomy with or without regional lymph node dissection. The decision on whether to perform lymph node dissection is dependent on local imaging investigation and serum calcitonin levels. Postoperative radiotherapy can also be indicated when there is ETE and residual MTC. Targeted therapy with tyrosine kinase inhibitors (TKIs) is recommended in MTC patients with metastatic disease [9]. In practice, pre-therapeutic measurement of serum calcitonin level is recommended to evaluate the metastatic risk of MTC. Patients with a calcitonin level < 500 pg/ml had almost no risk of metastasis [21]. Higher calcitonin concentration indicated a higher risk of metastasis. Current MTC management guidelines recommend using calcitonin as a screening marker for further imaging investigations to detect metastatic foci [9]. The presence of metastatic foci is therapeutically important, which leads to the initiation of systemic therapy such as TKIs [9, 22]. Our risk table of distant metastasis in combination with serum calcitonin calibration can be useful in this context. In addition, it is worth mentioning that IMTCGS uses basic histopathological features (mitosis, Ki67 index, and tumor necrosis) to stratify patients into low-risk and high-risk groups in terms of disease-free survival, distant metastasis, and overall survival [8]. We believe that a combination of the two tables (as clinical scores and histopathological scores) may yield a more precise and comprehensive risk staging system for MTC patients; we could not combine that information in this study due to the limitations of the SEER database. In their original study, the authors of the IMTCGS showed that gender, tumor size, and microscopic ETE were not significant in the multivariate analysis adjusted for IMTCGS scores. Therefore, the incorporation requires the adjustment of our risk table, focusing on other significant features such as age, ETE, and lymph node metastasis. Nevertheless, our risk table can still be used solely in pre-operative contexts, where a full IMTCGS evaluation could not be reached.

In our model, the calibration plots illustrated that a detailed estimated probability from the nomogram may deviate somewhat from the actual probability, especially when the estimated and actual values are more than 25%. However, this deviation does not substantially affect the performance of the risk table for differentiating the risk groups since the metastatic risk at the cutoff scores was well calibrated (under 25%). Therefore, it is reasonable to make a risk stratification instead of obtaining the exact nomogram-estimated risk value. The risk table is strongly recommended when a given patient belongs to the high-risk group since the model miscalibration occurred mainly in the high-risk patients whose metastatic risk was more than 25%.

Another crucial piece of information provided by the table is the difference in survival patterns between risk groups. The low-risk patients had a more favorable outcome while the high-risk MTC group had a shorter survival time. In the present study, chemotherapy-treated patients within different risk groups did not show survival differences. Most of the chemotherapy-treated patients belonged to the high-risk group. It can be explained by the indications for chemotherapy in MTC [9]. Patients with no surgical intervention also showed no difference in survivorship. Even so, high-risk and intermediate-risk patients tended to have a lower survival rate compared to the low-risk group. It is not clear why these patients did not undergo surgical treatment and whether it affects the metastatic risk. Compared to the current TNM classification, our risk table showed a more discriminative classification [11]. However, it should be clarified that our risk table is not specifically trained and validated based on the patient’s outcome.

In addition to our model, previous SEER-based studies also established predictive models for MTC [13, 23]. Their models are different from the present model. The goal of the previous models was to predict survival rates of MTC patients rather than the metastatic risk. Such models incorporate metastasis as a predictive variable. On the other hand, our model can provide both metastatic risk and prognostic information. Although investigating MTC patient outcomes is the mainstream of MTC studies, evaluating metastatic risk is also important and may be useful in certain clinical contexts.

It is noteworthy that our model and multivariate analysis yielded somewhat inconsistent results. That is, the magnitude order of some selected variables’ (gender, tumor size, and extrathyroidal extension) estimating power changed because we performed a multivariate analysis in the learning cohort and then established the model based on the training cohort. We selected the model that had the most unbiased performance to avoid overfitting problems in model training, which potentially occurs in model training.

Previous studies have proven that the clinical parameters of our risk table promote the course of MTC progression [6, 7]. Men have a higher risk of develo** advanced-stage MTC as well as other histologic types of thyroid cancer [6, 24]. A SEER-based multivariate analysis study showed that male patients tended to have a poorer survival rate due to advanced stage and aggressive behavior. The reason for a higher tendency to develop advanced stage in men is not known. However, another study showed that the male gender is an independent risk factor of MTC in both univariate and multivariate analyses [25]. Age has been shown to be a strong independent prognostic factor and a risk factor of lymph node metastasis in MTC [25,26,27]. Our results were consistent with those of previous studies. We presented direct evidence and a quantitative measure of the significant relationship between age and the occurrence of MTC distant metastasis. Tumor size and ETE are also risk factors for MTC lymph node and distant metastasis [6, 25, 28]. These factors are also implicated in TNM classification [16]. However, they are incorporated into one parameter, the T stage. Our multivariate analysis illustrated that tumor size and ETE can independently contribute to the risk of metastasis. A previous study also showed that gross ETE was an independent risk factor of distant metastasis and local recurrence in a multivariate analysis adjusted for lateral neck involvement, lymph node metastasis, and serum calcitonin [25]. In the present study, lymph node metastasis was the most powerful estimating factor of MTC metastasis. Although multifocality is related to lymph node metastasis [28], we showed that multifocality did not significantly increase the risk of distant metastasis in multivariate analysis. This can be explained by the presence of lymph node metastasis as the obvious confounder since multifocal tumors have a higher risk of lymph node metastasis.

While the present study provided a robust model for estimating the risk of MTC distant metastasis, it inherently contained certain limitations. First, the SEER project does not provide information about sporadic and familial types of MTC. The status of germline mutations in familial types of MTC was a potential confounder of certain variables in our study such as age and multifocality. The risk table may need to be modified when MTC type is known. Given the recent changes in treatment modalities, it is also important to examine the risk table under the context of new developments of targeted therapies. Second, even though the data was carefully evaluated by many steps of analysis, model testing in an external dataset is necessary before widespread application. Third, we were unable to include vascular invasion, imaging, histopathological findings, and genetic analyses from the SEER database. They are potential confounders and should be interpreted along with our risk table if possible. Finally, it is noteworthy that our model can only provide information about metastasis at the time of diagnosis rather than as a future predictor. Further analyses are required to validate the model in predicting metastasis with follow-up.

Conclusion

In summary, our risk table can be an adjuvant tool in the interpretation of tumor aggressiveness by using basic clinical information. The table can be used in combination with tumor biomarkers, imaging features, histopathological, and genetic findings to obtain a comprehensive landscape of MTC biological behavior.