Background

Soft tissue sarcomas (STSs) are a heterogeneous group of malignant tumors that can arise anywhere in the body, making their clinical course variable and the development of a meaningful staging system difficult. With this background in mind, the current eighth edition of the American Joint Committee on Cancer (AJCC) staging system has attempted to incorporate the anatomic primary tumor site as one of the prognostic variables in the T stage, as well as tumor size [1]. Some important factors other than the AJCC staging are also reportedly associated with survival in STS patients, including age, histologic diagnosis, surgical margin, and radiotherapy [2,3,4,5,6,7,8,9,10,11,12]. Estimation of accurate clinical course using these factors will enable clinicians to select appropriate patients who can gain benefit from invasive treatment and may improve prognosis for patients. Therefore, in order to achieve precision medicine, more accurate and personalized outcome prediction tools that can incorporate several prognostic factors influencing survival are required [13].

The nomogram, a graphical calculation tool designed to allow prediction of the overall probability of a specific outcome for any individual patient, can provide clinicians with accurate estimates of prognosis for several cancers. Kattan et al. [8] developed the first nomogram for general site STSs based on the 2002 World Health Organization (WHO) classification, and other nomograms for STSs were then subsequently reported, especially for site-specific tumors [10, 12]. However, there are some major concerns related to the practical use of such nomograms: 1) since the general site nomogram proposed by Kattan et al. [8], no practical nomograms for patients with trunk STS based on the latest WHO histological classification (2013) have been developed [8, 14]; and 2) previous nomograms were developed on the basis of Caucasian cohorts in countries where treatment strategies including the extent of the surgical margin and indications for radiotherapy differed from those involving Asian cohorts [8,9,10,11,12]. Furthermore, to our knowledge, there are no reported nomograms which can concurrently predict following three major outcomes during clinical course of STS: local recurrence-free survival (LRFS), distant metastasis-free survival (DMFS), and disease-specific survival (DSS).

Therefore, in the present study, we aimed to develop nomograms for predicting LRFS, DMFS, and DSS in patients undergoing definitive surgery for trunk and extremity STSs by analyzing data from the Bone and Soft Tissue Tumor (BSTT) Registry in Japan, which is a nationwide organ-specific cancer registry for bone and soft tissue tumors [2, 15].

Methods

Data source

The BSTT Registry is a nationwide organ-specific cancer registry for bone and soft tissue tumors in Japan. Details of the BSTT have been reported elsewhere [2, 15]. Detailed data for patients treated at the participating hospitals are collected annually in a clinician-oriented manner. The survey collects data in two sets. The first survey is conducted annually in May for patients treated between January 1 and December 31 of the previous year, and includes the following data for each patient: 1) basic data related to the patient: hospital, sex, age, date of diagnosis, status at the first visit, etc.; 2) information on the tumor: origin of the tumor (bone, soft tissue), histologic details (malignant or benign, and diagnosis), tumor location, the data required for TNM and Enneking staging (tumor size, nodal or distant metastasis, and histologic grade for malignant tumors.); 3) information on surgery: date of definitive surgery, type of surgery, reconstruction details, additional surgery for complications, etc.; and 4) information on treatments other than surgery: details of chemotherapy and radiotherapy. The second survey collects information on prognosis at 2, 5, and 10 years after the initial registration only for patients with bone and soft tissue sarcomas. It includes information on several outcome measures at the time of the latest follow-up, such as local recurrence, distant metastasis, oncologic outcome and limb salvage status. Use of the BSTT Registry for the purposes of clinical research was initiated in 2014 after approval from the Musculoskeletal Tumor Committee of the Japanese orthopaedic association (JOA). Approvals for the present study were obtained from the Institutional Review Boards of the JOA and National Cancer Center (2018–085).

Study design and eligibility

Data were obtained from the BSTT Registry during 2006–2015. The inclusion criteria for patients were as follows: 1) a diagnosis of primary STS with no distant metastasis (N0 M0 or N1 M0) arising from the trunk or extremities, 2) aged 18 years or older at diagnosis, and 3) having undergone definitive surgery with curative intent. We defined STSs of the trunk as tumors arising from the trunk excluding the head, those of the upper extremity as tumors arising from the shoulder girdle to the hand, and those of the lower extremity as tumors arising from the pelvic girdle (excluding endopelvic tumors) to the foot. STSs were defined as malignant soft tissue tumors (both low and high grade) based on the 2013 WHO classification [14]. Patients with benign or intermediate STSs were excluded. Patients with soft tissue Ewing sarcoma and alveolar or embryonal rhabdomyosarcoma were also excluded, in view of the characteristic natural course and treatment strategies for these tumors.

Prognostic variables

For this study, we extracted the following data: patient sex, age, tumor size, tumor site (trunk, upper extremity, or lower extremity), tumor depth, histological grade and diagnosis, nodal metastasis status (negative or positive) at diagnosis, surgical margin after surgery, and perioperative adjuvant therapy status. Tumor size was the maximum diameter measured by imaging modalities at the time of diagnosis. Histological grade was classified as either low or high grade. Tumor depth was assessed as superficial or deep relative to the investing fascia. Surgical margin status was defined microscopically as either positive (intralesional) or negative (marginal or wide). Histological diagnosis was based on the latest WHO criteria and grouped into 10 categories [14]: myxoid liposarcoma (MLS), leiomyosarcoma (LMS), dedifferentiated liposarcoma (DDLS), malignant peripheral nerve sheath tumor (MPNST), myxofibrosarcoma (MFS), synovial sarcoma (SySa), undifferentiated pleomorphic sarcoma (UPS), angiosarcoma, pleomorphic liposarcoma (PLS), and others, including diverse range of rare STSs with definitive diagnosis other than described above.

Statistical analysis

The primary endpoints were the occurrence of 1) local recurrence (LR), 2) distant metastasis (DM), and 3) disease specific death (DSD) at 2 years after surgery. LRFS or DMFS were defined as the period from the date of definitive surgery until the appearance of LR or DM, or until the last follow-up for patients without LR or DM, respectively. DSS was defined as the period from the date of definitive surgery until DSD, or until the last follow-up for survivors. Patients without these events or with lost to follow-up, or patients who died without LR or DM, or because of other causes, were censored at the last follow-up. The LRFS, DMFS and DSS were estimated using the Kaplan-Meier method, and comparisons were assessed using the log-rank test. Univariable and multivariable analyses were conducted using the Cox proportional hazards model. In the multivariable analysis, all the variables (age, sex, site, depth, size, histological diagnosis, histological grade, nodal metastasis, and surgical margin) were used as the independent variables with forced entry method. We determine the alpha level at 0.05 in this study.

Nomogram

We developed the multivariate nomograms starting from Cox regression. The probabilities of each endpoint at 2 years after surgery were calculated for each patient with the Cox regression model underlying the nomogram. Then, these models were internally validated with 50 bootstrap samples to prevent over-fitting and obtain a relatively unbiased estimate. We evaluated the nomogram performance based on the concordance index and calibration plot. Concordance index indicates discrimination ability of nomogram to distinguish whether patients experience an interest event or not and calibration plot assesses how well the risk predicted by our nomogram fit the observed risk.

In addition, we performed decision curve analysis (DCA), a statistical method designed to evaluate the clinical usefulness of a prediction model without additional data [16]. DCA provides a relevant risk threshold for use of the model by comparing the net benefit of a model-assisted decision with those of treat-all and treat-none regardless of the model decision. When the risk threshold includes the probability that clinicians are motivated to perform a medical intervention, model is determined as a useful clinical model. The probability also coincide with a probability that harm of false positive intervention outweighs that of false negative non-intervention. For instance, if a patient has a LR probability of over 0.5, many clinicians would opt for some form of intervention, such as short-term follow-up using imaging or adjuvant therapy, whereas if the probability is 0.1, many clinicians would opt for simple observation. In this case, if the net benefit of the model-assisted decision exceeds those for treat-all and treat-none with a probability between 0.1 and 0.5, then the model is deemed clinically useful, because for many clinicians the risk threshold lies within this range.

Statistical analysis was performed using IBM SPSS version 19.0 (IBM SPSS, Armonk, NY, USA) and the nomograms and decision curves were built using R software version 3.0.1 (R Foundation for Statistical Computing, Vienna, Austria) with the rms and rmda library.

Results

Patients characteristics

We identified 2954 patients who fulfilled the inclusion criteria during 2006–2015. Among them, we excluded patients for whom any values with respect to age, sex, tumor size, grade, depth, location, histological diagnosis, presence or absence of nodal metastasis at diagnosis and surgical margin status were missing. As a result, our cohort consisted of 2827 patients, all of whom had a complete set of data for the variables described above. The clinicopathological characteristics of the cohort are summarized in Table 1. The median follow-up period was 16 months (range, 3–90 months). LR, DM and DSD occurred in 241 (8.5%), 554 (19.6%) and 230 (8.1%) patients over the study period, respectively (Additional file 1: Figure S1, Additional file 2: Figure S2 and Additional file 3: Figure S3).

Table 1 Clinicopathologic and treatment characteristics (N = 2827)

Prognostic factors

Kaplan-Meier curves stratified by each variables and the results of log-rank test were exhibited in Additional file 1: Figure S1, Additional file 2: Figure S2 and Additional file 3: Figure S3. Nine prognostic factors selected based on clinical importance were significantly associated with each endpoints.

The results of the univariate and multivariate Cox proportional hazards models for LRFS are listed in Table 2. Multivariate analyses demonstrated that high-risk factors for LR were as follows: arising in the trunk (hazard ratio [HR], 2.55; 95% confidence interval [CI], 1.91–3.41; P < .001), large tumor size (≥ 5 cm and < 10 cm [HR, 1.95; 95% CI, 1.31–2.92; P = .001], ≥ 10 cm [HR, 3.50; 95% CI, 2.31–5.32; P < .001]), LMS [HR, 3.10; 95% CI, 1.43–6.69; P = .004], MFS [HR, 3.46; 95% CI, 1.51–7.94; P = .003], angiosarcoma [HR, 4.95; 95% CI, 1.32–18.58; P = .018], high grade (HR, 2.43; 95% CI, 1.48–3.99; P < .001) and positive margin (HR, 3.12; 95% CI, 2.22–4.37; P < .001).

Table 2 Cox proportional hazards models for LRFS

Table 3 shows the univariate and multivariate Cox proportional hazards models for DMFS. Multivariate analyses demonstrated that the high-risk factors for DM were as follows: large tumor size (≥ 5 cm and < 10 cm [HR, 2.02; 95% CI, 1.57–2.60; P < .001], ≥ 10 cm [HR, 3.39; 95% CI, 2.59–4.43; P < .001]), LMS [HR, 2.98; 95% CI, 2.00–4.45; P < .001], angiosarcoma [HR, 4.04; 95% CI, 1.78–9.15; P < .001], high grade (HR, 4.08; 95% CI, 2.74–6.09; P < .001) and positive nodal metastasis (HR, 2.06; 95% CI, 1.28–3.31; P = .003).

Table 3 Cox proportional hazards models for DMFS

Table 4 shows the univariate and multivariate Cox proportional hazards models for DSS. Multivariate analyses demonstrated that the high-risk factors of DSD were as follows: large tumor size (≥ 5 cm and < 10 cm [HR, 2.19; 95% CI, 1.41–3.39; P < .001], ≥ 10 cm [HR, 4.73; 95% CI, 3.03–7.38; P < .001]), LMS [HR, 2.94; 95% CI, 1.41–6.10; P = .004], MFS [HR, 3.21; 95% CI, 1.43–7.20; P = .005], angiosarcoma [HR, 11.40; 95% CI, 4.05–32.10; P < .001], high grade (HR, 5.37; 95% CI, 2.61–11.04; P < .001), and positive margin (HR, 1.54; 95% CI, 1.00–2.36; P = .048).

Table 4 Cox proportional hazards models for DSS

Availability of developed nomograms

We developed nomograms to predict LRFS, DMFS, and DSS at 2 years after surgery based on the weighted coefficients for each of the values obtained by Cox regression (Fig. 1a, b and c). The calibration plots for internal validation of these nomograms are shown in Fig. 2a, b and c. Concordance indices for the LRFS, DMFS and DSS nomograms were 0.73, 0.70 and 0.75, respectively, suggesting high predictive accuracy. Furthermore, in order to assess the clinical utility of this nomogram, we conducted DCA. The results of DCA demonstrated that use of these nomograms for clinical decision-making was more efficient than assuming that all patients would be treated or not treated, with a risk threshold ranging from 0.05 to 0.8, from 0.05 to 0.4, and from 0.05 to 0.3, respectively (Additional file 4: Figure S4).

Fig. 1
figure 1

Nomograms predicting the probability of LRFS (a), DMFS (b), and DSS (c) at 2 years after surgery. For use, the value for each variable in an individual patient is selected on the scale, and a line is drawn straight up from there to the Points axis to establish the corresponding score. All the scores are summed, and the total score is plotted on the Total Points line. A line is then drawn straight down to the 2-Year Survival Probability axis to obtain the probability. Dx, histological diagnosis; MLS, myxoid liposarcoma; LMS, leiomyosarcoma; DDLS, dedifferentiated liposarcoma; MPNST, malignant peripheral nerve sheath tumor; MFS, myxofibrosarcoma; SySa, synovial sarcoma; UPS, undifferentiated pleomorphic sarcoma; Angio, angiosarcoma,; PLS, pleomorphic liposarcoma; Oth, others; Neg, negative; Pos, positive

Fig. 2
figure 2

Calibration plots for internal validation of the LRFS (a), DMFS (b), and DSS (c) nomograms. Gray, ideal; black, observed; blue, bias corrected

Discussion

Recently, nomograms have been accepted for prognostication of sarcoma and other major cancers [9, 11, 17] because they enable clinicians to predict accurately any individual patient’s outcome, as well as being very user-friendly. However, nomograms need to be updated over time in accordance with the development of new treatments or changes in diagnostic criteria that may affect the course of the disease. Nevertheless, there are currently few nomograms for STSs based on the recent WHO histological classification [8, 10, 12]. Moreover, to our knowledge, no nomograms for STSs have been based on Asian cohorts. In the present study, we developed nomograms for prediction of LRFS, DMFS, and DSS in patients with primary STSs diagnosed on the basis of the recent WHO classification after definitive surgery employing a large cohort of Japanese patients in the BSTT registry. In addition, we confirmed the predictive accuracy and clinical utility of these nomograms by internal validation and DCA analysis, respectively.

We found that the LR, DM, and DSD incidence at 2 years after surgery were 8.5, 19.6 and 8.1%, respectively, being consistent with previous studies [3, 4]. In addition, histological diagnosis, grade, and tumor size have generally been accepted as predictive factors for STS in previous large-scale retrospective studies [2,3,4,5,6,7,8,9,10,11,12,13], and as expected, these factors were independent prognostic factors for all of the outcomes in our multivariate analyses (Tables 2-4). Furthermore, in agreement with previous studies, the surgical margin also strongly influenced the LRFS in our dataset [18, 19]. Because not only the variables using TNM staging but also other variables influenced each outcomes, our prediction models can stratify the patients with the same TNM staging into the risks of each outcome. For example, we compared two cases in stage III (seventh edition of AJCC staging system) as follows: 1) a 46-year-old man who presented with a 6 cm and deep-seated mass in his lower extremity. The tumor is resected with negative margin and diagnosed as high grade DDLS and; 2) a 78-year-old woman presented with 20 cm and deep-seated mass in her lower extremity. The tumor is also resected with negative margin and diagnosed as high grade LMS. The 2-year probabilities of DMFS in our nomogram were > 90 and 43%, respectively. Actually, the former case did not experience a distant metastasis, while the latter case experienced it at 11 weeks after definitive surgery.

One of the unique advantages of our nomograms is that they incorporate nodal metastasis status as a prognostic factor in the predictive models. No predictive nomograms reported to date have incorporated nodal metastasis despite its significant impact on survival, being one of the factors in the AJCC staging system, probably due to the rare incidence of nodal metastasis without distant metastasis, comprising only 2.1% of all STS cases [20, 21]. The use of a national database in our study enabled us to collect a large number of STS cases as a training data set for develo** predictive nomograms, enabling us to incorporate rare but important prognostic factors such as nodal metastasis and thus develop a more accurate predictive model. Another advantage of our nomograms is that they can predict three endpoints, unlike previously published nomograms [8,9,10,11,12], in terms of major progression events occurring during the clinical course of cancer, thus providing physicians with more detailed information about the future clinical course, and allowing better clinical care to be offered.

We acknowledge that the present study had some limitations. First, although adjuvant radiotherapy (RT) in addition to surgical resection is accepted as the standard treatment for STS in Western countries based on the favorable results of randomized trials [22, 23], we were unable to find any significant advantage of adjuvant RT for patients with STS. One of the main reasons for this conflicting result may have been the difference in the indications for adjuvant RT, which is used in Japan only for a small proportion of STS patients who have a higher risk of local relapse. In fact only 22% of patients in our dataset underwent adjuvant radiotherapy, compared with 32–82% in previous reports from Western countries [3, 4, 6, 9, 11]. Therefore, we investigated the differences in background characteristics between patients undergoing and not undergoing adjuvant RT (Additional file 5: Table S1). As expected, patients undergoing adjuvant RT showed significant differences in histological grade, tumor size, surgical margin, and nodal metastasis, suggesting that patients with a higher risk of local relapse underwent adjuvant RT in Japan. Therefore, there will be a need to externally validate our nomograms using specific patient populations before they can be recommended for clinical use even in Western countries, where therapeutic strategies are different. Second, as the BSTT data are collected from all hospitals across Japan, the differences in the perioperative management and the follow-up interval to check tumor recurrence after definitive surgery may vary among hospitals. In addition, some cases with pathological misdiagnosis may be included in the BSTT Registry because the rate of discordance of pathological diagnosis in STS is high [24]. These issues may result in information bias. Third, the results of DCA analysis for the LRFS model were widely acceptable in the risk probability range, whereas those for the DMFS and DSS model were available only from the low to middle risk probability range (Additional file 4: Figure S4). However, in terms of decision-making, a low or middle risk probability of DM or DSD is a more important issue than high risk, because there is no risk threshold at high probability. Therefore, our nomograms achieved a sufficient level of clinical usefulness. Fourth, because the majority of cases in the BSTT Registry were registered from specialized cancer hospitals, the frequency of severe cases in the BSTT Registry may be higher than that of population-based data. In addition, the BSTT Registry data is operated by JOA which means most of the cases are patients treated by musculoskeletal surgeons. Therefore, cases treated in other specialities would not be included in the registry: ex., advanced cases treated only by medical oncologist or sarcoma arising from the retroperitoneum or peritoneal cavity which can be treated by abdominal surgeon. For these reasons, there may be the difference of patient backgrounds between the BSTT Registry and the population-based database. Finally, our nomograms were able to predict only the 2-year probability of LRFS, DMFS, and DSS, because the follow-up period for BSTT data is still relatively short. Data for longer follow-up periods will therefore be required in order to develop a predictive model for estimation of survival at 5 or 10 years.

Conclusion

In conclusion, we have created nomograms for prediction of LRFS, DMFS, and DSS at 2 years after definitive surgery for patients with localized STSs in the trunk and extremity. We have confirmed the predictive accuracy and clinical utility of these nomograms by internal validation and DCA analysis, respectively. These nomograms can help clinicians to make decisions regarding adjuvant therapy or the interval of follow-up imaging, and also as a guide for patient counselling.