Introduction

Liver cancer ranks the fourth in the mortality of malignancy in the world, accounting for about 782,000 deaths each year, of which 85% are hepatocellular carcinoma (HCC) [1]. At present, surgical treatment is the most important curative treatment for patients with HCC, but the recurrence rate after 5 years is more than 50%, and the overall 5-year survival rate is only 18% [2, 3]. So, how can we reduce postoperative recurrence and improve postoperative survival in HCC patients? Recently, adjuvant therapy has been shown to improve survival in patients after HCC surgery. In a study of 200 patients with postoperative HCC, the researchers found that adjuvant transarterial chemoembolization significantly improved disease-free survival in patients with tumor size > 5 cm [4]. In a systematic review of 277 patients after HCC surgery, adjuvant immunotherapy was found to reduce the recurrence rate of the disease [5]. There were also some trials found that antiviral therapy could improve the prognosis of patients with HBV or HCV after HCC surgery [6, 1).

Table 1 Clinical characteristics of HCC patients (SEER 2010–2015)

Determination of independent risk factors

Univariate (Fig. 2A) and multivariate (Fig. 2B) logistic analyses were conducted in the training group to obtain independent risk factors. Univariate analysis of the clinical parameters showed that marital status, grade, AFP level, vascular invasion, tumor size, number of lesions, and T stage were related to the 5-year CSD of patients. Multivariate analysis showed that marital status, grade, AFP, vascular invasion, tumor size, and number of lesions were independent risk factors for 5-year CSD of patients. We found that married was a good prognostic factor for HCC, and AFP-positive and vascular invasion suggested a poor prognosis. And the lower the degree of differentiation, the larger the tumor volume, and the more the number of tumors, the worse the prognosis.

Fig. 2
figure 2

Univariate and multivariate analysis of variables with CSD. Univariate (A) and multivariate (B) logistic analysis for risk factor identification in the training group

Construction and verification of decision tree model

The independent risk factors derived from multivariate logistic analysis of the training group were used to construct a risk prediction model for 5-year CSD using a decision tree algorithm. The model constructed is shown in Figs. 3 and 4. Figure 3 shows the results of classifying patients without vascular invasion using the decision tree model. One-hundred seventy-one (16.3%, 171/1047) patients without vascular invasion were at high risk of CSD for 5 years. It can be observed from the figure that tumor size > 5cm is a risk factor for 5-year CSD (32.5%, 129/397), and patients with poorly and undifferentiated stage are high-risk groups for 5-year CSD (77.9%, 74/95). Figure 4 shows the results of classifying patients with vascular invasion using the decision tree model. One-hundred sixty-six (65.6%, 166/253) patients with vascular invasion were at high risk of CSD for 5 years. Consistent with the above results, tumor size > 5cm (55%, 44/80) and poorly and undifferentiated stage (93%, 93/100) are the main risk factors for CSD 5 years after liver cancer surgery. Then, we calculated the calibration curve of the model and found that the model had good fitting ability (Fig. 5A). We compared the ROC (Fig. 5B) of decision tree and logistic regression and found that the decision tree model (AUC = 0.76) had stronger prediction ability than logistic regression (AUC = 0.679). Then, we determined the threshold (threshold = 0.64) of the model according to the precision and recall (Fig. 5C). Patients were classified as high (survival rate ≤ 0.64) and low risk (survival rate > 0.64) according to this threshold. We also calculated the F1 (F1 = 0.836, Fig. 5D) and classification accuracy (classification accuracy = 0.752, Fig. 5E) of the model when the model threshold was 0.64. In the validation set, when the threshold was 0.64, AUC, classification accuracy, precision, recall, and F1 scores were 0.729, 0.757, 0.873, 0.824, and 0.848, respectively (Table 2). According to the model, all patients (n = 1625) with HCC undergoing surgery could be divided into two groups, of which 413 cases were high-risk group and 1212 cases were low-risk group (Additional file 1). These data suggested that the decision tree model had good prediction performance.

Fig. 3
figure 3

Decision tree model to predict 5-year CSD in HCC patients (without vascular invasion). Green represents low-risk; red represents high risk

Fig. 4
figure 4

Decision tree model to predict 5-year CSD in HCC patients (with vascular invasion). Green represents low risk; red represents high risk

Fig. 5
figure 5

Evaluation results of the model in the training cohort. A The calibration curve of the model. B The ROC of the model and logistic regression. C The precision-recall curve determines that the threshold of the model is 0.64. D The F1 score of the model. E The classification accuracy of the model

Table 2 Evaluation results of the model in the training and internal testing cohort

Effect of surgery combined with chemotherapy on high-risk and low-risk patients

To further explore the effect of surgery combined with chemotherapy on the prognosis of HCC patients, the high-risk group and low-risk group were further divided into two subgroups according to whether or not they had received chemotherapy. In the high-risk group, there was a significant difference in AFP between surgery alone and surgery combined with chemotherapy. In order to eliminate this confounding factor, we treated with PSM. After PSM correction, there was a significant difference in 5-year CSD between the two groups. The 5-year survival rate of patients treated with surgery alone was 15.5% (11/71), and that of patients treated with surgery combined with chemotherapy was 35.2% (25/71) (Table 3). In the low-risk group, there were significant differences in AFP, lesion, and grade between surgery alone and surgery combined with chemotherapy. We used PSM to eliminate these confounding factors. We found no difference in 5-year CSD between the two groups. These data suggested that surgery combined with chemotherapy can significantly improve the prognosis of HCC patients in the high-risk group, but it has no effect on the prognosis of HCC patients in the low-risk group (Table 4).

Table 3 Characteristics of high-risk HCC patients
Table 4 Characteristics of low-risk HCC patients with AJCC 8th edition stages 1–3

Discussion

The progress of surgical resection, ablation, and liver transplantation has improved the prognosis of HCC patients to some extent, but compared with other common human cancers, the long-term survival rate of HCC patients is still not ideal due to the high recurrence rate and lack of effective adjuvant therapy [20, 21]. Therefore, we must carry out hierarchical management and targeted treatment for postoperative patients with different risk levels in order to improve the long-term survival rate of patients with liver cancer. In this study, we found that tumor size, vascular invasion, AFP level, and number of lesions were independent risk factors for 5-year CSD through univariate and multivariate logistic regression analysis. Married was a good prognostic factor for HCC, and AFP-positive and vascular invasion suggested a poor prognosis. And the lower the degree of differentiation, the larger the tumor volume, and the more the number of tumors, the worse the prognosis. Previous studies have shown that tumor size, vascular invasion, AFP level, and number of lesions may affect the prognosis of patients with HCC, which is consistent with the results of this study [22,23,24]. Interestingly, in this study, it was found that marital status was also an independent risk factor for 5-year CSD. This is in kee** with previous reports that married patients had better 5-year HCC cause-specific survival than did unmarried patients (46.7% vs 37.8%) [25]. Marital status is an important prognostic factor for survival in patients with HCC treated with surgical resection.

There have also been previous reports on the postoperative prognosis model of HCC. Shim et al. established the survival nomogram of postoperative HCC patients (AUC = 0.66) [26]. This study also constructs a logistic regression model (AUC = 0.679). In contrast, the decision tree model (AUC = 0.760) in this study has better prediction performance. It seems to have greater clinical application potential. In the present study, vascular invasion, tumor size, and poor differentiation were the main risk factors for 5-year CSD in HCC patients after surgery, which is in kee** with previous studies [27, 28]. The prognosis of patients with vascular invasion, tumor size > 5cm, or poorly stage is poor. The decision tree prediction model in this study can accurately predict the high-risk group of patients with 5-year CSD after HCC surgery, help to realize patient-specific early diagnosis and treatment, and further improve the prognosis of HCC patients.

In recent years, some studies have found that surgical resection of HCC combined with chemotherapy can improve the postoperative survival rate [29,30,31]. However, there are no clinical guidelines recommending the routine use of surgery combined with chemotherapy for HCC patients because the beneficiaries are still uncertain. In this study, for the high-risk and low-risk patients divided based on the decision tree model, in the high-risk patients, the prognosis was significantly improved after surgery combined with chemotherapy, while in the low-risk patients, there was no significant change in CSD 5 years after surgery combined with chemotherapy. This means that the prognostic model established in this study can provide a reference for guiding the management of postoperative adjuvant chemotherapy.

The data source of this study is SEER database, which is an important resource for practical research in oncology. One-thousand six-hundred twenty-five HCC patients with complete clinical data were included. The characteristic distribution of the data is normal, and the model has good prediction performance in both training set and verification set, which provides a sufficient and reliable basis for further clinical application. However, this study also has some limitations. Because this study is based on a public database, the collection of clinical data is limited by the items provided in the data set, and it is impossible to explore more possible prognostic factors. In addition, the prognostic risk prediction model constructed in this study still needs external validation to further confirm its effectiveness.

Conclusions

The 5-year CSD prediction model based on decision tree algorithm provides accurate prediction information. The high-risk patients determined by the prediction model may benefit from the 5-year survival after surgery combined with chemotherapy. The prediction model is expected to provide reference for postoperative management of patients with HCC in the future.