Introduction

Male breast cancer (MBC) is clinically rare, accounting for approximately 1% of all breast cancers; however, its annual incidence has increased in recent years [1, 2]. Because the incidence of breast cancer in men is much lower than that in women, most breast cancer clinical studies only include women. Therefore, there are few prospective data to guide the clinical treatment of male breast cancer.

Even though the survival rate of breast cancer patients has improved in recent years, patients with distant metastasis still had a worse prognosis, with an overall 5-year survival rate of 27% [3]. Some studies have shown that MBC patients had a worse outcome than females, which could be attributed to a later stage at diagnosis, older age at diagnosis, or a subtype with a poor prognosis, such as triple negative breast cancer (TNBC) [4,5,6,7]. Compared with female breast cancer patients who had distant metastasis, MBC patients with distant metastasis showed a higher proportion of simultaneous bone and lung metastasis [

Fig. 1
figure 1

The flow chart of patients selection and the flow chart for the development, evaluation and explanation of models

Feature selection and data preprocessing

Variables with less than 30% missing values were managed by KNNImputer algorithm [16]. Non-hierarchical multiple categorical variables were processed by One-Hot [17]. Fourteen features were selected in this study to predict distant metastasis (M1), including age, laterality, grade, T stage, N stage, radiotherapy, chemotherapy, ER, PR, HER-2, subtype_0 (HR + /HER2-), subtype_1 (HR + /HER2 +), subtype_2 (HR-/HER2-) and subtype_3 (HR-/HER2 +); logistic least absolute shrinkage and selection operator (LASSO) regression was applied to screen the features [18]. Ultimately, age, T stage, N stage, ER status, subtype_0 (HR + /HER2-) and subtype_2 (HR-/HER2-) were selected to develop ML models.

The development of ML models

A ratio of 7:3 was used for randomly dividing patients into training and test groups. Four powerful ML models were examined in this study, including extreme gradient boosting (XGBoost), k­nearest neighbor (KNN), decision tree (DT) and support vector machine (SVM). In the training set, SMOTE resampling method was applied to address the unbalanced data, and stratified ten-fold CV was applied to prevent overfitting of ML models. A grid search method with ten-fold CV was also applied to optimize the hyperparameters of the ML models. The details are shown in Fig. 1.

The evaluation of ML models

We assessed the performance of different ML models in the training, testing and external validation set. Models were evaluated and compared according to the area under curve (AUC) [19] and Brier score [20]. Higher AUC values and smaller Brier scores indicate better performance of the ML models.

The explanation of ML models

To intuitively understand the nature of the ML model with the feature of ‘black-box’, the SHAP framework was introduced into this study to interpret the optimal ML model. Its interpretability performance has been validated in many models [

Results

The clinical and pathological characteristics of MBC patients

A total of 2351 MBC patients were included into this retrospective analysis. The median age was 68 years old. Most patients had Grade 2 (54.7%) and AJCC.T0/Tis/T1 (46.0%). A total of 1306 (55.6%) patients had N0 stage cancer. Most patients did not receive radiotherapy (73.3%) or chemotherapy (61.5%). A total of 2038 (86.7%) patients belonged to the HR + /HER2- subtype. A total of 168 (7.1%) patients had distant metastasis, of whom 117 (5.0%) patients had bone metastasis, and 71 (3.0%) patients had a lung metastasis (Table 1).

Table 1 The baseline of all patients

The performance comparison of different ML models

According to the LASSO regression, the optimal feature number was 6 (Figure S1), including Age, T stage, N stage, ER status, subtype_0 (HR + /HER2-) and subtype_2 (HR-/HER2-). Four ML models were well trained and none of them exhibited overfitting (Figure S2).

In the training set, the XGB model showed the largest mean AUC (0.884) by the tenfold CV (Fig. 2A), and the XGB model also demonstrated the biggest AUC (0.907 vs 0.839 vs 0.903 vs 0.888, Fig. 2B) and the second smallest Brier score (0.125 vs. 0.161 vs. 0.120. vs. 0.136, Fig. 2C). In the test set, the XGB model also showed the largest AUC (0.827 vs. 0.822 vs. 0.769 vs. 0.811, Fig. 2D) and the second smallest brier score (0.145 vs. 0.161 vs. 0.160 vs. 0.144, Fig. 2E). In the external validation set, the XGB model also showed the largest AUC (0.754 vs. 0.717 vs. 0.552 vs. 0.629, Fig. 2F) and the smallest Brier score (0.122 vs. 0.136 vs. 0.159 vs. 0.159, Fig. 2G).

Fig. 2
figure 2

The performance comparison of different ML models. The AUC comparison of different ML models in train set (tenfold cross validation, A). The ROC curves of different ML models in train (B), test (D) and external validation sets (F). The calibration curves of different ML models in train (C), test (E), and external validation sets (G)

To further compare the performance of different ML models, the Delong test was performed. In the training set, the AUC value of the XGB model was significantly larger than that of the DT and KNN models (p < 0.05, Table 2). In the test set, no significant difference was observed between the XGB model and other models (p > 0.05, Table 2). In the external validation set, the AUC value of the XGB model was significantly larger than that of KNN and SVM models (p < 0.05, Table 2).

Table 2 The AUC comparison of different ML models in different sets

Although no significant AUC difference was observed in the test set, which could be attributed to limited Data, the XGB model still showed better performance in the training and external validation sets. Therefore, the XGB model was selected as the optimal ML model for predicting distant metastasis risk in MBC patients.

The development of nomogram

In the training set, univariable and multivariable logistic regression analyses were applied to explore the independent risk factors for the construction of the nomogram. In the univariable logistic regression analysis, age, grade, AJCC.T, AJCC.N, chemotherapy, subtype, ER, PR and HER-2 were significantly correlated with M1 (p < 0.05, Table S1). Then, the multicollinearity among these parameters was tested. Subtype was excluded from multivariate analysis because of a VIF value > 5, and other variables were incorporated. The results of multivariable logistic regression analysis demonstrated that patients with younger age, G3, T3/T4/TX, N ( +) or ER negative status had a higher risk of distant metastasis (p < 0.05, Table S1).

Characteristics with p < 0.05 in multivariable logistic regression analysis of the training set were incorporated to develop the nomogram (Figure S3A). The C-index for distant metastasis prediction were 0.802 in the training set (Figure S3B), 0.838 in the test set (Figure S3D) and 0.706 in validation set (Figure S3F). Similar results (0.790, 0.838 and 0.701, respectively) were observed when bootstrap** was utilized for internal validation. The distant metastasis prediction was highly consistent with the actual observations in the training set (Figure S3C). However, distant metastasis prediction was not in good agreement with actual observations in the test (Figure S3E) and external validation (Figure S3G) sets.

The performance comparison of XGB model and nomogram

For a more detailed assessment of the performance of the XGB model, the predictive performance was compared between XGB model and nomogram.

The AUC value of the XGB model was larger than that of the nomogram in the training (0.907 vs 0.802) and external validation (0.754 vs 0.706) sets. The AUC value of XGB model was slightly lower than that of the nomogram in the test validation set (0.827 vs 0.838). In addition, the Z statistic of the XGB model was greater than that of the nomogram in the training (77.248 vs 13.029), testing (10.901 vs 9.764) and external validation (4.915 vs 3.556) sets (Table 3). Therefore, the predictive performance of XGB is better than that of the nomogram.

Table 3 The AUC comparison of XGB model and nomogram (based on multivariable logistic regression analysis) in different sets

The prediction of bone and lung metastasis based on the XGB model

Based on the above results, the XGB model showed the best predictive ability. The two most common distant metastasis organs were bone and lung [25]. Therefore, we further predicted the risk of bone and lung metastasis for male breast cancer patients based on XGB model. For the prediction of bone metastasis, the XGB model also showed a high AUC value (0.880, 0.823 and 0.747) and a low Brier score (0.136, 0.149 and 0.095) in the training, testing and external validation sets, respectively (Fig. 3). For the prediction of lung metastasis, the XGB model also showed a high AUC (0.906, 0.859 and 0.756) and a low Brier score (0.143, 0.149 and 0.112) in the training, testing and external validation sets, respectively (Fig. 4).

Fig. 3
figure 3

The prediction of bone metastasis based on XGBoost model. The ROC curves of XGBoost model in train (A), test (C) and external validation sets (E). The calibrations of XGBoost model in train (B), test (D) and external validation sets (F)

Fig. 4
figure 4

The prediction of lung metastasis based on XGBoost model. The ROC curves of XGBoost model in train (A), test (C) and external validation sets (E). The calibrations of XGBoost model in train (B), test (D) and external validation sets (F)

The interpretability of the XGB model

Based on the above results, the XGB model showed the best predictive ability. Therefore, the SHAP framework was introduced to interpret the model. Figure 5A illustrated all of the risk factors evaluated by the mean absolute SHAP value. T, age and N were the three most important variables. Figure 5B illustrated how the risk factors influence distant metastasis. The y-axis represented the value of risk factors, and the x-axis (SHAP value) represented the impact of risk factors on model output (distant metastasis). High T stage, lower age, high N stage, ER negative, and HR(-)/HER2(-)(subtype_2) increased the probability of distant metastasis.

Fig. 5
figure 5

The XGB model’s interpretation. The importance ranking of the different variables according to the mean (∣SHAP value∣) (A); The importance ranking of different risk factors with stability and interpretation using the optimal model (B). The higher SHAP value of a feature is given, the higher risk of death the patient would have. The red part in feature value represents higher value. A classical sample with distant metastasis (C), and a classical sample without distant metastasis (D)

The combination of different variables influenced the patient outcome. Therefore, to demonstrate the model's interpretability, we provided two classical samples: a distant metastasis patient with AJCC T2 stage and HR(-)/HER2(-) (Fig. 5C), and a patient with non-distant metastasis with AJCC.T1 and AJCC.N0 stage (Fig. 5D). The patient with distant metastasis had a high SHAP value (3.31) and a high prediction score (0.965); The patient without distant metastasis had a low SHAP value (-4.61) and a low prediction score (0.010).

The application of the XGB model

To make it easier for others to use this model, we developed a Web APP based on the XGB model. For example (Fig. 6), enter a patient's information into the model: age 68 years old, AJCC T1, AJCC N0, ER negative and HR( +)/HER2(-). Then, the model outputted a probability of distant metastasis was 0.0892, which indicated that this patient had a very low distant metastasis risk. The Web APP is available online (https://greenmood.shinyapps.io/male/).

Fig. 6
figure 6

Screenshot of the Web APP based on XGBoost model, which is available at https://greenmood.shinyapps.io/male/

Discussion

Although MBC is rare, its incidence is gradually increasing. A previous study showed that MBC patients had a higher proportion of advanced disease than female breast cancer patients [26], which could be attributed to a lack of awareness and screening of breast cancer in MBC patients [27]. Therefore, it is necessary to discover and predict the risk of distant metastasis in a timely manner for MBC patients. This study demonstrated that predictive ability of the XGB model is better than that of other ML models and nomogram in predicting distant metastasis risk in male breast cancer patients. In addition, this model could also accurately predict the bone and lung metastasis risk. Through the SHAP value of each variable, the contribution and impact of each risk factor on mortality were intuitively demonstrated.

The clinicopathological characteristics of MBC are different from those of FBC. The results [25, 28] of the international MBC program demonstrated that the median age at diagnosis of MBC patients was 68.4 years old, and up to 99.3% patients were ER positive, while only 8.7% of patients were HER-2 positive. In this retrospective analysis from the SEER database of American and our hospital, similar clinicopathological characteristics of MBC were observed. The median age was 68.0 years old. Approximately half of the patients (1286, 54.7%) had a grade 2 cancer, as previously reported [29, 30]. Most patients belonged to the HR + /HER2- subtype (2038, 86.7%). Up to 97.5% patients were ER positive (99.1% in the validation set), and only 11.7% patients were HER2 positive (6.4% in the validation set). This study demonstrated that 168 (7.1%) patients had a distant metastasis and the two most common distant metastasis organs were bone and lung, which is also as previously reported [25].

In different international breast cancer guidelines, the standard of therapy for MBC is based on FBC [31, 32]. Although MBC patients could benefit from local treatment and systemic treatment, the prognosis of MBC is worse than that of FBC [26] because of the later stage at diagnosis or older age at diagnosis. In addition, MBC patients showed a higher risk of having contralateral breast cancer than FBC patients, which also increased the risk of death [33]. In addition, the delay in seeking medical treatment due to lack of knowledge or public education also leads to poor prognosis of MBC patients [34]. However, recent studies also found that MBC patients had a similar or a better prognosis than FBC patients after adjusting for some risk factors, such as age and stage [9, 10]. Therefore, early detection, early diagnosis and early treatment are very important to improve the prognosis of breast cancer. In clinical practice, we have noticed that many male patients refused professional breast examinations due to embarrassment or a lack of public education about MBC, which leads to a delay in getting medical attention. If we can develop a tool or model to predict the probability of mortality, it would be helpful to urge MBC patients to receive timely profession examination or treatment.

In recent years, ML models have also been widely applied to predict survival or lymph node metastasis of breast cancer [23, 35, 36]. However, it has not been used to predict the distant metastasis risk of MBC patients. In this research, we compared the predictive ability of four powerful ML algorithms, and XGB was the best model in predicting distant metastasis in MBC patients. The XGB model showed the largest mean AUC value in the tenfold CV (0.884) and the largest AUC value in the training (0.907), testing (0.0.827) and external validation (0.754) sets. These findings may be due to the unbalanced data (only 7.1% patients experienced distant metastasis) and limited sample size in the external validation set. However, we applied some statistical methods (such as SMOTE resampling) to address this problem. The calibration curves still demonstrated a slight deviation. However, the XGB model still presented a more perfect calibration curve and a better net benefit than the other three ML models with the smallest brier score (0.122) in the external validation set. In the future, a larger and balanced sample could present a better performance of XGB model. In addition, the XGB model also demonstrated a powerful ability to predict bone and lung metastasis in these three sets. Different from other ML model that lack of interpretability [37, 38], we introduced SHAP framework to interpret the “black box” of the XGB model. The feature importance of characteristics was intuitively observed through the summary plots based on the SHAP value. In addition, how a variable influences the outcome was intuitively shown by the SHAP value, and the force plots illustrated two classical personalized samples (Fig. 5).

To date, only one study has explored the relationship between clinicopathological characteristics and distant metastasis of MBC by nomogram [39]. However, the performance of the nomogram was poorer than that of our ML model in the training set (AUC: 0.822 vs 0.907) and lacked external validation, which also reduced the reliability and practicability of nomogram. Currently, an increasing number of ML models had been applied to the prediction of lymph node metastasis or survival state. However, it has not been used to predict distant metastasis in male breast cancer patients. In addition, no previous study has compared the ability of ML models and nomogram to predict distant metastasis in male breast cancer patients. Our previous study demonstrated that the XGB model had a better ability than the nomogram in predicting lymph node metastasis in breast invasive micropapillary carcinoma patients [23]. In this study, the results also showed that the XGB model had a better predictive ability than the nomogram in predicting M1 of MBC patients.

To make it easier for other researchers to use our model, we developed a public Web APP. After entering some necessary parameters, the user could obtain the probability of distant metastasis of an MBC patient. We believe that the model could urge MBC patients to receive standard treatment in time by telling them the probability of distant metastasis or help clinicians adjust the treatment plan in a timely manner.

This is the first study to develop, test and validate an ML model for the prediction of distant metastasis in MBC patients. Some limitations should also be noted. First, the data was extracted from SEER database of America, and our hospital is limited; more data from other regions will help the application of XGB model. Second, the information from the SEER database is finite, and using a cohort including more clinical and pathological characteristics (like AR status, Ki67 index, etc.) to train a model would help further improve the performance of ML model. Third, the TNM staging information from SEER database between 2010 to 2015 is blurry. Therefore, it is necessary to include pure pathological data to develop an ML model in the future.

Conclusions

The XGB model is a better tool for the prediction of distant metastasis among MBC patients than other ML models and nomogram. It is also a powerful model for predicting bone and lung metastasis. The SHAP framework could effectively help clinicians intuitively understood how a variable influences the outcome of an MBC patient. The Web APP based on XGB model could help doctors adjust treatment plans or urge MBC patients to receive standard treatment in time.