1 Introduction

Primary retroperitoneal sarcoma (RPS) is a rare malignancy, accounting for about 0.1% of adult cancers globally, and clinical estimates project an annual occurrence of 0.5 to 1 new case per 100,000 individuals [1]. RPS originates within the expansive retroperitoneal space, characterized by an absence of discernible symptoms during its initial stages. Clinical manifestations only become evident in the later stages when it encroaches upon or exerts pressure upon neighboring organs. At this juncture, the neoplasm often attains significant proportions and infiltrates adjacent structures, thus resulting in an unfavorable prognosis for the patient [2]. RPS comprises more than 70 different histologic subtypes [3], these types of pathology are further complicated by their variable presentation, behavior, and long-term outcomes, which emphasize the importance of accurate prognostic models to guide patient management. Surgical resection is the preferred treatment option for RPS, and it can significantly improve patient prognosis [4,5,6,7]. However, not all patients are suitable candidates for surgery due to factors such as metastasis or extensive tumor burden.

Due to the rarity of primary RPS and the diverse range of histological subtypes, predicting patient prognosis can be challenging. A nomogram is a valuable visual tool in clinical medicine. It provides personalized survival estimates based on individual patient characteristics, aiding doctors in understanding survival probabilities, assessing risk, and supporting treatment decisions. In clinical investigation, nomograms help identify suitable participants and enhance trial efficiency and reliability. They also facilitate patient and family understanding of survival probabilities and risks, improving trust and treatment plan adherence. In summary, nomograms significantly improve patient care and outcomes. Although several nomograms have been developed to predict the prognosis of RPS patients with surgical resection [15]. Age, tumor size, median household income, local population size, and degree of urbanization were converted to categorical variables. For OS, X-tile software divided age into three groups: ≤ 55 years, 56–78 years, and ≥ 79 years, tumor size was divided into two groups: ≤ 7.7 cm and > 7.7 cm; for CSS, age was divided into ≤ 54 years, 55–78 years and ≥ 79 years three groups, tumor size was divided into ≤ 10.4 cm and > 10.4 cm two groups. In both OS and CSS, the median household income of patients was divided into two groups: 0–54999 dollars and ≥ 55,000 dollars; population size and degree of urbanization in the patient’s place was divided into two groups [I: nonmetropolitan counties not adjacent to a metropolitan area, nonmetropolitan counties adjacent to a metropolitan area and counties in metropolitan areas of less than 250 thousand population; II: counties in metropolitan areas of 250,000 to 1 million population and counties in metropolitan areas greater than or equal to 1 million population]. Univariate and multivariate Cox regression analyses were performed based on the develo** set, variables with P < 0.05 in univariate Cox regression analysis were included in multivariate Cox regression analysis. To develop the two nomograms, we used multivariable Cox regression analysis using a backward procedure with Akaike information criterion (AIC) as the variables selection criterion for OS and CSS models [16]. Internal and external validation of the OS nomogram and internal validation of the CSS nomogram were performed, and concordance indices (C index) and calibration plots were used to measure discrimination and calibration ability. Furthermore, the accuracy of the nomograms for 3-, 5- and 7-year survival prediction were compared with the TNM staging system to demonstrate by using the area under (AUC) the receiver operating characteristic curve (ROC).

2.3 Statistical analysis

Statistical analysis was performed using R software, version 4.0.3 (R Foundation for Statistical Computing). Variable selection based on AIC was implemented using the MASS R package [17], potentially prognostic factors, age, sex, race and ethnicity, tumor size, histopathological type, FNCLCC grade, tumor multifocality, surgery of primary site, surgery of regional lymph nodes, pre-/postoperative chemotherapy, median household income, local population size and degree of urbanization, were all included. Multivariable Cox regression analysis was performed to calculate the hazard ratios, two-sided Wald test P values, and 95% confidence intervals of the selected variables and build the nomograms. A risk score of a single patient was calculated using the survival R package and the median was used as a cutoff value for high-risk and low-risk groups [18]. Statistical comparison of survival curves was using a log-rank test. The internal and external validation sets were applied in the model built using the develo** set. The bootstrap method was applied for 1000 iterations of computation.

3 Results

3.1 Patient characteristics

The baseline clinicopathological and epidemiological characteristics of the 1427 patients with RPS (mean [SD] age, 60.2 [15.0] years) were shown in Table 1, and the characteristics were well balanced between the develo** set and the validation set.

Table 1 Demographic, clinical, and pathologic characteristics according to datasets

Among the study population, 619 (43.4%) patients were female, and 547 (38.3%) were male. The median age was 60 (range, 0–95) years. The median tumor size was 139.5 (range, minimum: 11, maximum: > 989) mm. The median follow-up time was 29 (range, 1–140) months. The median OS and CSS were 56.21 (IQR: 47.20–71.10) and 67.54 (IQR: 61.01–78.15) months, respectively. Patients diagnosed with liposarcoma, leiomyosarcoma, fibrosarcoma and other pathological types were 848 (59.4%), 397 (27.8%), 26 (1.9%) and 156 (10.9%) respectively. Race and ethnicity of patients were divided into 3 categories: White (906, 63.5%), Black (127, 8.9%), and others (394, 27.6%).

3.2 Variable selection and multivariable Cox analysis

Results of univariate and multivariate Cox regression analysis were displayed in Table 2, P < 0.05 was considered significant. Based on AIC, age, FNCLCC grade, chemotherapy, tumor size, primary site surgery, and tumor multifocality were identified as independent risk factors for OS and CSS. The results of the final OS and CSS multivariate Cox regression models were displayed in Table 3. The multivariable Cox analysis of the develo** set found that older age (56–78 years vs ≤ 55 years, hazard ratio [HR], 1.49; 95% confidence interval [CI] 1.14–1.96; P < 0.01; ≥ 79 years vs ≤ 55 years, HR, 3.00; 95% CI 2.06–4.38; P < 0.001), higher FNCLCC grade (grade II vs grade I, HR, 2.11; 95% CI 1.48–3.02; grade III vs grade I, HR, 3.95; 95% CI 2.85–5.49; both P < 0.001), pre-/postoperative of chemotherapy (yes vs no, HR, 2.01; 95% CI, 1.53–2.64; P < 0.001), tumor size > 7.7 cm (> 7.7 cm vs ≤ 7.7 cm, HR, 3.06; 95% CI 1.96–4.79; P < 0.001), no surgery operated on primary site (no vs yes, HR, 3.35; 95% CI 2.23–5.04; P < 0.001) were independent risk factors for worse OS, and older age (55–78 years vs ≤ 54 years, HR, 1.43; 95% CI 1.03–1.98; P < 0.05; ≥ 79 years vs ≤ 54 years, HR, 2.54; 95% CI 1.54–4.19; P < 0.001), higher FNCLCC grade (grade II vs grade I, HR, 2.69; 95% CI 1.67–4.34; grade III vs grade I, HR, 5.30; 95% CI 3.39–8.28; both P < 0.001), pre-/postoperative of chemotherapy (yes vs no, HR, 1.93; 95% CI 1.39–2.69; P < 0.001), tumor size > 10.4 cm (> 10.4 cm vs ≤ 10.4 cm, HR, 2.60; 95% CI 1.74–3.88; P < 0.001), no surgery operated on primary site (no vs yes, HR, 4.58; 95% CI 2.90–7.23; P < 0.001), single neoplasm (single vs multiple, HR, 4.419; 95% CI 2.67–7.31; P < 0.001) were independent risk factors for worse CSS. Median household income and local population size and degree of urbanization were excluded based on AIC in both multivariable Cox models.

Table 2 Univariable and multivariable Cox regression for analyzing the risk factors affecting overall and cancer-specific survival
Table 3 Results of the final multivariable cox models (after the AIC-based backward selection) for OS and CSS

3.3 Nomogram development and validation

Based on the develo** sets, nomogram models for OS and CSS were constructed respectively by including associated factors, according to the multivariate Cox regression model (Fig. 2). FNCLCC grade had the greatest significance and can contribute a maximum of 100 points in both OS and CSS nomograms.

Fig. 2
figure 2

Nomograms for Predicting OS (A) and CSS (B) Based on Results of Multivariable Cox Analysis of Develo** Set

The OS nomogram was internally and externally validated and the CSS nomogram was only internally validated due to incomplete external validation data. The C indices for OS nomogram (develo** set: 0.76 [95% CI 0.69–0.83]; internal validation set: 0.76 [95% CI 0.66–0.86]; external validation set: 0.69 [95% CI 0.58–0.80]; whole set: 0.75 [95% CI 0.70–0.80]) and CSS nomogram (develo** set: 0.81 [95% CI 0.74–0.87]; internal validation set: 0.80 [95% CI 0.70–0.90]; whole set: 0.80 [95% CI 0.75–0.85]) were higher than that of TNM staging system (OS: develo** set, 0.61 [95% CI 0.54–0.67]; internal validation set, 0.60 [95% CI 0.50–0.70]; whole set: 0.60 [95% CI 0.55–0.66]; CSS: develo** set, 0.60 [95% CI 0.53–0.67]; internal validation set, 0.55 [95% CI 0.45–0.65]; whole set: 0.58 [95% CI 0.53–0.64]), which demonstrated that the models had good discrimination ability.

Furthermore, we compared the predictive ability of the nomograms and TNM staging system through the AUC values of 3-, 5- and 7-year OS rates and CSS rates (Fig. 3). The AUC values of the nomogram for predicting 3-, 5- and 7-year OS were 0.84, 0.82, and 0.78 in the develo** set, 0.81, 0.82, and 0.83 in the internal validation set, and 0.68, 0.75, and 0.72 in the external validation set, respectively, while for the TNM staging system, the AUCs were 0.72, 0.66 and 0.65 in the develo** set, and 0.67, 0.67, and 0.64 in the internal validation set. The AUCs of the nomogram for predicting 3-, 5- and 7-year CSS were 0.88, 0.88, and 0.85 in the develo** set, and 0.86, 0.88, and 0.86 in the internal validation set, respectively, while for the TNM staging system, the AUCs were 0.63, 0.60 and 0.58 in the develo** set, and 0.57, 0.56, and 0.54 in the internal validation set.

Fig. 3
figure 3

ROC Curves Evaluated the Predictive Ability of the Nomograms (AC, F, G) and TNM Staging System (D, E, H, I). AC nomogram for predicting OS ROCs in develo** set, internal validation set, and external validation set respectively; D, E TNM staging system for predicting OS ROCs in develo** set and internal validation set; F, G nomogram for predicting CSS ROCs in develo** set and internal validation set; H, I TNM staging system for predicting CSS ROCs in develo** set and internal validation set

The calibration plots of the nomograms showed good consistency between the nomogram predicted and actual survival in the develo** sets and validation sets (Fig. 4). Using the median risk score 0.91 and 0.33 as the cutoff values for OS and CSS models, respectively, the patients from each cohort were stratified into high-risk and low-risk groups. The OS and CSS curves from each dataset for the different risk groups were significantly different (log-rank P < 0.001 in each dataset, Fig. 5). The same conclusion was obtained for the validation sets.

Fig. 4
figure 4

Calibration Plots for Estimating OS (AI) and CSS (JO) Survival Probability at 3, 5, and 7 Years. AC Calibration plots in the OS develo** set; DF Calibration plots in the OS internal validation set; GI Calibration plots in the OS external validation set; JL Calibration plots in the CSS develo** set; MO Calibration plots in the CSS internal validation

Fig. 5
figure 5

Kaplan–Meier OS (AC) and CSS (DE) Curves of Different Risk Groups for RPS Patients Based on Risk Group Stratification in Each Cohort. A Kaplan–Meier OS Curves of develo** set; B Kaplan–Meier OS Curves of internal validation set; C Kaplan–Meier OS Curves of external validation set; D Kaplan–Meier CSS Curves of develo** set; E Kaplan–Meier CSS Curves of the internal validation set

4 Discussion

Most patients with RPS have grave prognoses with a high recurrence rate, yet their life expectancy remains greatly various [3, 19], which makes survival difficult to assess. As a rare tumor, there are many pathological types of RPS, the most common pathological types, such as liposarcoma and leiomyosarcoma, have been studied more frequently [20,21,22,

5 Conclusions

This study developed and validated nomograms for predicting the prognosis of patients with RPS based on a large cohort, clinicopathological and epidemiological features were integrated to predict patients 3-, 5- and 7-year OS and CSS. The nomograms have been verified both internally and externally, and provide significantly better discrimination than the AJCC TNM staging system, hel** to guide monitoring and improve long-term survival outcomes, showing great potential for future clinical application. They can be used in patient consultations to provide accurate and useful information to both doctors and patients and accurately assess risk in individual patients and enhance clinical trial stratification. They also may be used in determining the proper timing for end-of-life discussions and/or hospice referrals.