1 Introduction

Head and neck cancers (HNCs) ranked as the seventh most prevalent form of malignancy worldwide, comprising 4.9% of all cancer cases (931,931 incident cases) and 4.7% of all cancer-related deaths (467,125 deaths) in 2020 [1]. The estimated global burden of disease primarily resulted from years of life lost, accounting for 95.9–98.5% depending on the specific site of cancer. The remaining burden was attributed to years lived with disability, constituting 1.5–4.1% of the total burden [2]. Survival typically takes precedence as the primary concern for individual patients [3]. Prognostic models were developed and validated to predict individualized survival in HNCs [4,5,6]. These models often relied on data from hospital-based cancer registry systems or medical records in western countries [4,5,6]. However, survival disparities by race exist in clinical studies, particularly between white and black HNC patients, which have been well documented [7, 8]. The applicability of the current models to other racial populations might be limited.

The Taiwan Cancer Registry (TCR) is a population-based cancer registry that was established by the government of Taiwan in 1979 [9]. The primary purpose is to collect detailed clinical information and demographic characteristics of newly diagnosed cancer patients for surveillance and control efforts [10]. Over time, the TCR has made persistent strides in enhancing the quality and content [9, 10]. The TCR is internationally recognized as one of the highest-quality cancer registries worldwide, providing coverage for nearly 100% of the national population [11]. In academic contexts, the TCR has been utilized to develop prognostic models for several types of cancer, including colon and breast cancers, in Asian populations [12, 13]. Additionally, given its annual registration size of over ten thousand cases of HNCs, the TCR holds substantial potential for develo** a reliable prognostic model specifically tailored to HNCs in Asian populations.

This study aimed to develop population-based prognostic models for predicting survival in HNCs. The process involved utilizing demographic, clinical, and pathological information from HNC patients recorded in the TCR between 2013 and 2018 (N = 49,137) to construct the models. Furthermore, the models were validated using data from a diverse racial population sourced from the Surveillance, Epidemiology, and End Results (SEER) cancer registry database. This can help shed light on whether race plays a significant role that could potentially influence survival prediction in HNCs. Additionally, it can confirm the effectiveness of the models specifically within Asian populations.

2 Materials and methods

We employed a study methodology that involved patient selection, model development, model evaluation, and external validation. Figure 1 illustrates the entire process flow.

Fig. 1
figure 1

Study process flow diagram

2.1 Patients in Dataset

We collected data on 67,828 patients diagnosed with HNCs between January 01, 2013 and December 31, 2018 from the TCR with follow-up information until December 31, 2020. The definition of HNCs was based on the International Classification of Diseases for Oncology, third edition (ICD-O-3) coding [14]. The tumor site categories included oral cavity (C00: lip, C01–C02: tongue, C03: gum, C04: floor of mouth, C05: palate, C06: other and unspecified parts of mouth), oropharynx (C09–C10), hypopharynx (C12–C13), pharynx and ill-defined sites in lip, oral cavity, and pharynx (C14), salivary glands (C07–C08), nasopharynx (C11), and larynx (C32). We excluded certain patients from the study, including those who did not receive any treatment at the reporting hospital, duplicated data entries, those with hematologic neoplasms, individuals who passed away before receiving treatment, patients with unknown tumor size, those diagnosed with stage 0 or carcinoma in situ, and individuals diagnosed before the age of 18. Finally, a total of 49,137 eligible patients were involved in further analysis. This study has been approved by the institutional review boards of National Taiwan University Hospital (201910027W).

2.2 Variables in Dataset

The variables recorded in the TCR, which have the potential to predict overall survival (OS) and head and neck cancer-specific survival (CSS), were analyzed. These variables include age, sex, body mass index (BMI), diagnosis year, tumor site, tumor size, depth of invasion (DOI), maximum lymph node diameter, the American Joint Committee on Cancer clinical and pathological stage (I, II, III, IV), and treatment modalities including surgery, chemotherapy, and radiotherapy (yes, no). Additionally, behavior factors such as alcohol consumption, tobacco smoking, and betel nut chewing (yes, no) were also considered.

During the final process of external validation, BMI, site-specific features (DOI and maximum lymph node diameter), clinical stage, and behavior factors were excluded from the model since they are not registered in the SEER database. However, race information was included to evaluate prediction differences.

2.3 Model Development

Three types of prognostic models were developed using data from the TCR. Model 1 (surgical model) encompassed all patients who underwent surgery, except those with nasopharyngeal carcinoma. Model 2 (oral model) focused on patients with oral cancer who underwent surgery, representing the majority of surgical patients. Model 3 (non-surgical model) included all non-surgical patients, except those with salivary gland cancer. Each model was constructed using the Cox proportional hazards regression model to estimate the hazards of develo** survival events for each variable [15]. Initially, a univariate Cox regression model was employed to identify significant variables, which were further considered as predictors in the multivariate Cox prognostic model. Both OS and head and neck CSS were treated as outcomes in Models 1, 2, and 3, resulting in a total of six prognostic models constructed to comprehensively investigate the model performance of the TCR data for HNCs. All models were built using the 'survival' package in R 4.1.3, a programming language and software environment for statistical computing and graphics, supported by the R Foundation for Statistical Computing [16]. A P value of < 0.05 was considered as significant.

2.4 Model Evaluation and Validation

The performance of the prognostic models was first evaluated internally using a tenfold cross-validation, where patients were randomly partitioned into 10 groups, and the averaged performance was calculated [17]. The measurement indicators for model evaluation included model discrimination and calibration [18]. In the discrimination analysis, Harrell’s c-index was used to assess the concordance between the predicted and observed survival status of paired patients [19]. A c-index higher than 0.7 indicates good model performance. As for calibration analysis, the difference in proportion between the predicted and observed mortality of all patients in a given calibration year was statistically tested using the proportion test.

The SEER database served as the external validation dataset in this study, employing the same indicators as used for internal validation, which included discrimination and calibration analyses. The dataset comprised HNC patients diagnosed between 2010 and 2015 in the SEER database [20]. The follow-up period extended until December 31, 2020. The criteria for including the study population were also grounded in the ICD-O-3, and the exclusion criteria for patients were identical to those applied to the TCR dataset mentioned earlier. Race information was available in the SEER database, and three primary populations were examined (Asian, white, and black). Within the Asian population, subpopulations encompassed Chinese, Japanese, Korean, Vietnamese, and Laotian individuals.

3 Results

3.1 Patient Characteristics

Table 1 summarizes the demographic and clinicopathological characteristics of the analyzed patients in the TCR and SEER datasets, respectively. Each dataset comprised approximately 50,000 patients, with approximately 60% of the patients having received surgery. Regarding the TCR dataset, the mean age was 56.5, and male patients constituted the majority (88.6%). The number of diagnosed patients showed a slight increase over time, with an annual count of around 8,000. Oral cavity cancer emerged as the most prevalent form of HNCs, encompassing 58.1% of all patients and 78.4% of surgical patients. The tumor size of surgical patients was notably smaller than that of non-surgical patients, with mean sizes of 26.0 mm and 37.7 mm, respectively. Both stages I and IV constituted more than 30% of surgical patients, while stage IV made up the majority, accounting for over 60% of non-surgical patients. Chemotherapy and radiotherapy were administered to only about 40% of surgical patients, whereas approximately 88% of non-surgical patients received these treatments. Three-fourths of the patients had a history of tobacco smoking, whereas only about half reported alcohol consumption and betel nut chewing.

Table 1 Demographic and clinicopathological characteristics of the studied datasets

In the context of the SEER dataset, the composition exhibited similarities with the TCR dataset. The mean age was 63.3, with male patients constituting the majority (71.8%). Diagnosed patient counts averaged around 8000 annually, showing a slight upward trend over time. Oral cavity cancer stood out as the most prevalent HNC type, representing 49.5% of all patients and 56.3% of surgical patients. Notably different mean tumor sizes were observed between surgical (24.8 mm) and non-surgical patients (33.7 mm). Pathological stage information revealed that both stages I and IV accounted for more than 30% of surgical patients. The administration rates of chemotherapy and radiotherapy were significantly higher among non-surgical patients. In terms of race information, the dataset was predominantly composed of patients identifying as white (87.1%).

3.2 Prognostic Models

Six prognostic models were developed to predict both OS and CSS for three categories of surgical status: surgical patients (Model 1), surgical oral cancer patients (Model 2), and non-surgical patients (Model 3). Table 2 provides an overview of the significant variables identified in the univariate Cox regression model, along with their hazard ratios (HRs) and associated statistical significances estimated in the multivariate Cox prognostic model for Model 1 with respect to OS. The summaries of the remaining five models are presented in Supplementary Tables S1–S5. Generally, older age, male sex, lower BMI, and earlier diagnosis year were associated with increased hazards of mortality. Oropharyngeal cancer among surgical patients, palate cancer among patients with surgical oral cavity cancer, and oral cavity cancer among non-surgical patients were linked to poorer survival outcomes. Larger tumor size, greater maximum lymph node diameter, and higher clinical and pathological stages were identified as adverse prognostic factors. DOI was significant only in Model 2 pertaining to surgical oral cavity cancer. Chemotherapy had a slightly negative impact on survival in Model 1 for both OS and CSS, as well as in Model 3 for OS. In contrast, radiotherapy exhibited positive effects in all models. Alcohol consumption demonstrated detrimental effects across all models, while tobacco smoking showed a contrasting effect in Model 1.

Table 2 Results of Cox regression Model 1 (surgical model) for overall survival

3.3 Performance of the Models

The predictive performance of the models was assessed through internal and external validations, which included both discrimination and calibration analyses. Table 3 presents the results of discrimination analysis, encompassing the TCR training and testing datasets in internal validation, and the three primary populations from the SEER dataset in external validation. Generally, Harrell’s c-index exhibited stability in internal validation, with values exceeding 0.7 for both OS and CSS in both the training and testing datasets in all models. For external validation, the c-index generally ranged between 0.6 and 0.7. However, the values exceeded 0.7 for the Asian population when predicting OS in all models and CSS in Model 3, suggesting that the models are particularly effective for the Asian population.

Table 3 Harrell’s c-index values in discrimination analysis for overall survival and cancer-specific survival in TCR datasets (training and testing datasets) and SEER datasets (Asian, White, and Black populations)

Regarding calibration analysis, the objective is to assess whether the predicted number of death events significantly differs from the observed number of death events. Table 4 presents the results for Model 1 concerning OS, while the results for Models 2 and 3 for OS, and all three models for CSS are summarized in Supplementary Tables S6–S10. Across different calibration years, the majority of differences in proportion between predicted and observed deaths were less than 5% in both the TCR training and testing datasets, accompanied by insignificant P values for all models (Table 4 and Supplementary Tables S6–S10). This suggests that the models perform exceptionally well in predicting both overall and cancer-specific mortalities in the Taiwanese population. When considering the three primary populations in the SEER dataset, Model 1 demonstrated outstanding predictive capability for OS, with the majority of differences in proportion being less than 5%, especially for the Asian and black populations, which were associated with insignificant P values (Table 4). Model 2 exhibited satisfactory predictive capability with differences in proportion below 10% for both OS and CSS (Supplementary Tables S6 and S9). In contrast, Models 3 performed relatively poorly, especially for the white population, where some differences in proportion exceeded 20% (Supplementary Tables S7 and S10).

Table 4 Results of calibration analysis for Model 1 (surgical model) for overall mortality in different populations

4 Discussion

This study introduced three types of prognostic models designed for three groups of HNC patients: surgical patients (Model 1), surgical oral cancer patients (Model 2), and non-surgical patients (Model 3), with the aim of predicting both OS and CSS. The development of these models was grounded in data from the national cancer registry of Taiwan, encompassing 49,137 patients diagnosed with HNCs between 2013 and 2018. The discrimination and calibration analyses during internal validation consistently exhibited remarkable performance in predicting HNC survival and mortality across all models and settings within the Taiwanese population. When testing these models on the SEER database for different populations during external validation, we observed that race significantly impacts survival prediction in HNCs. The models were most effective for the Asian population, particularly in predicting OS for surgical patients. These results affirm the efficacy of the presented models in Asian populations, underscoring the importance of develo** tailored models for specific demographic groups.

The variables included in the models consisted of only 15 items from the cancer registry database, excluding information that is difficult to derive. This could enhance the practicality and applicability of the research findings, similar to how PREDICT, a valuable prognostic algorithm for predicting OS and CSS in early-stage breast cancer in Britain, evolved into a useful online predictive tool [21, 22]. The American Joint Committee on Cancer (AJCC) has acknowledged the demand for more personalized probabilistic predictions and has established criteria for the inclusion and exclusion of risk models in individualized prognosis as part of the practice of precision medicine [23]. The AJCC acceptance criteria have been applied to review early published prognostic nomograms in HNCs [24]. The prognostic models we introduced adhere to the AJCC criteria to enhance the integrity of research results, as well as their practical applicability.

An important point to clarify is that the effects of the variables presented in the prognostic models, as indicated by the HR in Table 2 and Supplementary Tables S1–S5, may not definitively represent the unbiased, pure effects of these variables. These effects were accurately estimated using the exponential of the coefficients in the Cox regression model. However, the causal diagram becomes significantly complex in clinical scenarios when multiple levels and domains of variables are included. Undiscovered associations or interactions among these variables may lead to overestimation or underestimation of the pure effects. Moreover, although we selected the 15 most relevant variables from the TCR dataset based on our knowledge, there is a probability that other factors may be underestimated or even unaccounted for. In this study, the overall predictive capability of the models holds more significance than the individual effect of each variable. Further assessment of these effects may be conducted in dedicated studies that control for confounding factors.

Besides the available information from the registry data we employed, other prognostic models for HNCs have been developed by incorporating predictive factors from various fields. These factors include autophagy-related genes [25], cancer stem cell markers [26], immune signatures [27], parameters from positron emission tomography/computed tomography (PET/CT) [28], and radiomic features from magnetic resonance (MR) or CT scans [29,30,31]. Greater efforts are needed to obtain this information since it is typically not collected as part of routine procedures, but it has the potential to enhance the predictive capabilities of the models. Concerning specific types of cancer, there has been an increase in human papillomavirus (HPV)-associated oropharyngeal cancer, primarily induced by HPV type 16, among younger individuals [32]. Patients with HPV-associated oropharyngeal cancer tend to exhibit increased sensitivity to chemoradiotherapy, resulting in improved survival rates [32]. To predict outcomes for oral cavity cancer, machine learning algorithms have been employed in the development of prediction models [33,34,35].

This study possesses several strengths. First, a relatively large sample size is involved, encompassing 49,137 HNC patients from the TCR, enhancing the precision of the model predictions. Second, the models have been confirmed to be effective within the Taiwanese and Asian populations, highlighting the significance of personalized survival predictions that take race into account in HNCs. Third, the models utilize only 15 commonly registered items as predictive variables, enhancing their clinical applicability and practicality.

This study also has some limitations. First, the performance of the models still has room for improvement, especially in predicting CSS, as some of the Harrell’s c-index values in discrimination analysis are below 0.7, and differences in proportion in calibration analysis exceed 10%. Second, despite the SEER database being one of the most comprehensive databases, covering approximately 50% of the US population, the SEER dataset used for external validation is predominantly composed of white patients. To enhance the robustness of our findings, the next step could involve exploring other databases to confirm the predictive efficacy in Asian populations. Third, distinguishing whether the racial disparities observed in model predictions in this study genuinely result from racial or ethnic backgrounds is challenging and necessitates cultural and social information to ascertain the underlying factors.

5 Conclusions

In summary, we have developed and validated population-based prognostic models for predicting HNC survival using the TCR data with a large sample size from Taiwan. In internal validation, the models exhibited favorable and stable performance among the Taiwanese population. In external validation, the models consistently demonstrated satisfactory performance in predicting OS in Asians. Our results emphasize the importance of tailored survival models for specific demographic groups, and provide practically applicable models for Asian HNC patients.