Introduction

Active surveillance (AS) is the recommended strategy for patients with low-risk, localized prostate cancer (PCa) and is being recommended for some intermediate-risk patients [1, 2]. It aims to postpone or avoid active treatment in individuals with localized PCa, while maintaining their quality of life and functional outcomes, and reducing overtreatment [1, 3]. However, the lack of consensus on inclusion criteria stringency and disease progression definition has led to significant variability in AS protocols across different centers and guidelines [4, 5]. Consequently, the cumulative five-year dropout rate on AS reaches 44%, with 27% triggered by disease progression [6, 7]. Given the potential for tumor progression and metastasis during AS, determining optimal selection criteria remains a crucial issue.

According to established clinical criteria and indicators, including the prostate-specific antigen (PSA), clinical T-stage, and biopsy findings, current guidelines classify patients with localized PCa into risk categories and recommend AS for all low-risk patients [8]. Furthermore, AS has been proposed as an option for selected intermediate-risk PCa patients with low-volume Gleason Grade Group (GGG) 2. Several studies have demonstrated the oncologic safety of AS relative to aggressive treatment [9,10,11]. However, biopsy results tend to underestimate the actual GGG of patients, resulting in some patients not meeting the enrollment criteria for receiving an AS regimen. Up to 25% of patients with biopsy GGG 1–2 PCa may qualify for AS but harbour adverse pathology (AP: pT3 and/or N1 and/or GGG ≥ 3) at radical prostatectomy (RP) [12, 13]. By enrolling these patients in AS, they may miss the opportunity for curative treatment due to disease progression [14, 15]. Additionally, the inflexibility of risk categories may limit the number of patients with pathologically indolent PCa who qualify for AS, increasing the risk of overtreatment [16]. Recently, several multivariate models and nomograms based on clinical and multiparametric magnetic resonance imaging (mpMRI) features have been developed to overcome these limitations and have shown superior diagnostic efficacy compared to traditional risk categories [17, 18]. However, the usefulness of these models in terms of diagnostic accuracy is still controversial, and prostate biopsies for confirming PCa still tend to rely on PSA-based specificity [19].

Positron emission tomography/computed tomography (PET/CT) with [68Ga]-labeled prostate-specific membrane antigen inhibitors ([68Ga]Ga-PSMA) has been widely used in the clinical staging of primary PCa and the restaging of biochemically recurrent PCa [20, 21]. Previous studies have demonstrated a strong positive correlation between the maximal standardized uptake value (SUVmax) of [68Ga]Ga-PSMA PET and GGG for primary prostate tumors, indicating its potential to predict pathological upgrade from biopsy to RP [22]. The implementation of this novel molecular imaging technique could prove more advantageous than mpMRI in the patient screening process for AS [15]. In recent years, the field of radiomics has rapidly progressed, offering the ability to extract valuable quantitative data from digitally encrypted medical images, thereby providing additional information on lesions [23]. The combination of radiomics and machine learning has exhibited the capacity to accurately predict postoperative GGG of PCa in a non-invasive manner [24]. Notably, unlike tumor biopsies, radiomics has the potential to characterize the local tumor phenotype based on the entire lesion, rather than relying on tumor subsamples.

Our study aims to develop and validate a stratified machine learning model that combines [68Ga]Ga-PSMA PET/CT with traditional clinical risk factors. This model will be used to predict postoperative AP in patients with GGG 1–2 at biopsy, aiding in selecting patients for AS.

Materials and methods

Patients

The study protocol was approved by the Ethics Committee of **. Radiology. 2020;295:328–38." href="/article/10.1186/s40644-024-00735-2#ref-CR29" id="ref-link-section-d48930418e1109">29]. All VOIs were normalized, discretized using fixed bin width (FBW = 0.25), and then resampled to 2.0 × 2.0 × 2.0 mm3 voxels before feature extraction. A total of 107 3D radiological features were extracted, which were categorized into seven feature classes: shape (n = 14), first order (n = 18), Gray Level Co-occurrence Matrix (GLCM) (n = 24), Gray Level Dependence Matrix (GLDM) (n = 14), Gray Level Run Length Matrix (GLRLM) (n = 16), Gray Level Size Zone Matrix (GLSZM) (n = 16), and Neighbouring Gray Tone Difference Matrix (NGTDM) (n = 5).

After radiomics feature extraction, we used two steps to select the features. At first, the minimum redundancy maximum relevance (mRMR) algorithm, which has been proven to be effective in radiomics feature selection [30], was performed to eliminate the redundant and irrelevant features. Then, the least absolute shrinkage and selection operator (LASSO) regression model, was conducted to choose the optimized subset of features to construct the final model.

Model construction

For model development and assessment, patients were randomly divided into a training and testing group in a ratio of 7:3. All models were developed based on the training cohorts and subsequently evaluated on the testing cohorts.

The clinical model was constructed in two steps based on the clinical features. Firstly, univariate logistic regression was performed to assess clinical features including age, PSA, free PSA (FPSA), prostate volume, PSA density (PSAD), FPSA/total PSA (TPSA), biopsy GGG, % of positive cores, SUVmax, SUVmean, PSMA-TL and PSMA-TV. Then, those features with P < 0.05 in univariate logistics analysis were analyzed in multivariate logistic regression analysis to build a model.

We used a logistic regression classifier to build the radiomics model based on the selected radiomics features. A stratified tenfold cross-validation was applied with 100 iterations in the training group to develop a reliable and stable model, and the model was then assessed in the testing group. Radiomics score (Radscore) was calculated for each patient via a linear combination of selected features that were weighted by their respective coefficients.

For the combined model, clinical features with P < 0.05 in univariate logistics and Radscore were imported into the multivariate logistic regression, and statistically significant indicators were screened to establish a visualized quantitative model, the nomogram outcome stratification prediction model.

Statistical analysis

The t-tests/Spearman rank tests and Chi-square/Fisher's exact tests were used to compare the clinical features between men with AP and those without. Univariate and multivariate logistic regression analyses were performed to determine independent predictors, and then build the prediction models.

The areas under the receiver operating characteristic (ROC), area under the curve (AUC), decision curve analysis (DCA), and calibration curve were used to assess the diagnostic value, clinical utility, and predictive accuracy of those models, respectively. Statistical analysis was performed using IBM SPSS statistics software, version 26.0, and R software, version 4.1.3. P < 0.05 was considered statistically significant.

Results

Patient characteristics

A total of 75 patients with biopsy GGG 1–2 PCa were included in this study. At final pathology after RP for all patients, 30 patients (40%) had AP. Thirty patients with AP at RP were randomly divided into the training cohort (n = 21) and the testing cohort (n = 9). Of the 45 patients without AP at RP, 32 patients were assigned to the training cohort and 13 patients were assigned to the test cohort. There were no significant differences in all clinical and image features between the training cohort and testing cohort. Table 1 shows the characteristics of all patients in detail.

Table 1 Patient characteristics

Clinical model

The results of the clinical features in the comparison of the patients with AP and patients without AP are shown in Table 2. The univariate logistic regression analysis showed significant differences in FPSA/TPSA, SUVmax, SUVmean, PSMA-TL, and PSMA-TV between the two groups (P < 0.05). Subsequently, the significant variables from the univariate analysis were included in the multivariate logistic regression models. The results showed that FPSA/TPSA (odds ratio [OR]: 0.00, 95% confidence interval [CI]: 0.00–0.57) and PSMA-TV (OR: 1.29, 95% CI: 1.06–1.58) were the independent predictors for adverse pathology. Finally, the clinical model was established according to FPSA/TPSA and the PSMA-TV. As shown in Table 3, the AUC, sensitivity, and specificity of the training group were 0.821 (0.695–0.947), 76.2% (58.0%–94.4%), 81.2% (67.7%–94.8%), respectively, and 0.795 (0.603–0.987), 77.8% (50.6%-100%), 69.2% (42.4%-87.3%) in the testing group.

Table 2 Univariate and multivariate Logistic analysis of clinical factors for predicting patients with adverse pathology
Table 3 Diagnostic performance of three models in training and testing cohorts

Radiomics model

A total of 107 Image Biomarker Standardization Initiative (IBSI) compliant radiomic features were extracted from whole prostate PET images. Among them, 30 radiomic features were retained by mRMR. Then, the optimal adjustment weight λ (λ = 0.0672759851974577) was determined for the LASSO algorithm (Figure S1), and 6 nonzero coefficient features were selected to construct the final radiomic model. Figure S2 shows the detailed names and weights of the 6 radiomics features.

Radscore was calculated by multiplying each feature coefficient by the corresponding eigenvalue and summing. The Radscores for all patients were shown in Fig. 1. In both training and testing cohorts, the patients with AP group had a higher Radscore than patients without, and the Radscores showed great discrimination performance to distinguish between these two groups. The radiomics model yielded an AUC, sensitivity, and specificity of 0.830 (0.720–0.941), 90.5% (77.9%-100%), and 68.8% (52.7%-84.8%) in the training group. In the testing group, the radiomics model demonstrated an equal sensitivity of 77.8% (50.6%-100%) and higher specificity (overlap** 95% CIs) of 92.3% (77.8%-100%) than a clinical model, with an AUC of 0.829 (0.624–1.000) (Table 3).

Fig. 1
figure 1

Bar diagrams of Radscore for each patient in the training cohort (a) and testing cohort (b). The red bars are Radscore values for patients with adverse pathology at final pathology, and the green bars are Radscore values for patients with favorable disease at final pathology. Radscore, radiomics score

Combined model

Clinical features with statistically significant differences between the two groups and Radscore were included in multivariate logistic regression to establish a combined model. The results showed that FPSA/TPSA (OR: 0.00, 95% CI: 0.00–2.21) and Radscore (OR: 9.92, 95% CI: 2.72–36.23) were the significant independent predictors of AP. A nomogram including FPSA/TPSA, and Radscore based on the combined model was shown in Fig. 2. ROC curve analysis showed that the AUC values for the combined model were 0.875 (0.780–0.970) and 0.872 (0.678–1.000) in the training and test cohorts, respectively, showing good sensitivity and specificity (Table 3).

Fig. 2
figure 2

Nomogram based on the combined model predicting AP at RP, among patients with biopsy GGG 1–2 PCa. AP adverse pathology, RP radical prostatectomy, GGG Gleason Grade Group, PCa prostate cancer, Radscore radiomics score, FPSA free prostate-specific antigen, TPSA total prostate-specific antigen

The comparison and evaluation of the three models

A comparison of the ROC curves of these three models is shown in Figure S3. The combined model displayed the highest AUC (overlap** 95% CIs) values among the three models, with the highest sensitivity (overlap** 95% CIs) and moderate specificity in both training and testing cohorts. The Hosmer–Lemeshow calibration curves for the three predictive models were constructed in the training and testing groups. If the predicted probabilities on the calibration curve closely resembled the observed probabilities, and the P-value of the Hosmer–Lemeshow test was greater than 0.05, it indicated a high calibration accuracy of the model. In our study, it clearly demonstrated a high degree of concordance between the dotted lines (reference lines) and the coloured lines (calibration curve) in Fig. 3. In addition, the P-values of the clinical model, radiomics model, combined model were 0.303, 0.593, 0.445 in the training group, and 0.465, 0.598, 0.685 in the testing group. These results showed good agreement between the predicted and actual results. As shown in Fig. 4, DCA was performed to compare the clinical utility of the three prediction models in predicting the AP. The results indicated that the net benefit of the combined model and the radiomics model was greater than that of the clinical model.

Fig. 3
figure 3

The calibration curves of the three models in the training cohort (a) and testing cohort (b). The P-values of the clinical model, radiomics model, combined model were 0.303, 0.593, 0.445 in the training group, and 0.465, 0.598, 0.685 in the testing group

Fig. 4
figure 4

DCA of the clinical, radiomics, and combined models for predicting AP at final pathology in the testing cohort. DCA decision curve analysis, AP adverse pathology

Discussion

In this study, we developed three models based on clinical and/or [68Ga]Ga-PSMA PET-based radiomics features to redefine the inclusion criteria for AS in patients with biopsy Gleason Grade Group 1–2 PCa, which may have important clinical value in maximally reducing overtreatment and avoiding inappropriate adverse pathology patients progressing. The combined model demonstrated superior predictive ability compared to the clinical and radiomics models alone, specifically in identifying patients with adverse pathology at the final analysis, who should not be considered for AS. Internal validation results revealed that the combined model effectively fulfills the clinical requirements for selecting appropriate AS candidates, leveraging the full potential of [68Ga]Ga-PSMA PET/CT scans.

Currently, guidelines stratify PCa patients based on biopsy results, PSA levels, and clinical stage, with AS primarily recommended for low-risk patients [8]. However, the strict inclusion criteria of AS limit the inclusion of suitable patients. Moreover, the limited number of included indicators omits some patient information, leading to the inclusion of patients who may not be appropriate for AS, resulting in delays in their treatment [16, 31]. To extend the AS inclusion criteria and reduce the inclusion of unsuitable patients, several predictive models based on clinical characteristics and conventional imaging features have been developed in recent years. Gandaglia et al. [32] developed a multivariable model using patients' PSA levels, clinical stage, biopsy grade group, number of positive cores, and PSA density to assess the risk of poor outcomes in low-risk or intermediate-risk PCa patients, aiding in the selection of AS candidates. The results demonstrated a 10% increase in the number of patients eligible for AS compared to PRIAS criteria, without increasing the risk of misclassification [33]. However, the diagnostic efficacy of this model remains poor, possibly due to the absence of specific imaging characteristics from prostate MRI and [68Ga]Ga-PSMA PET/CT. Another study, which developed a multivariable model including variables from MRI and targeted biopsy, validated that the inclusion of MRI features significantly enhanced the diagnostic performance of the model for adverse pathology [16]. This improvement may be attributed to the correlation between PI-RADS scores and adverse pathology, as well as the more accurate pathology obtained through MRI-targeted biopsy. Previous studies have shown that [68Ga]Ga-PSMA PET/CT is a more accurate predictor of adverse pathological outcomes compared to mpMRI [15, 34]. In addition, previous literatures have proved that PSMA PET-targeted biopsy, combined with the technique of intraoperative quantification of PSMA PET uptake in core biopsies, could improve the detection rate of csPCa compared with systematic biopsy and reduce the need for saturation biopsy [35]. In our study, we developed a clinical model based on patients' clinical characteristics and conventional PSMA PET/CT features (FPSA/TPSA and PSMA-TV), which demonstrated improved diagnostic performance. However, it is undeniable that the lesion features provided by visual assessment of PSMA PET/CT are limited, and the acquisition of some features is subjective.

Radiomics can extract features in a high-throughput and quantitative manner that cannot be obtained through visual evaluation by clinicians. This can improve the accuracy of diagnosis, prognosis, and prediction [36, 37]. Currently, research on utilizing radiomics data to select patients for AS is limited and predominantly based on MRI imaging [38,39,40]. Compared to models using clinical and imaging visual evaluation features, radiomics models based on MRI imaging often exhibit similar or slightly lower diagnostic performance [38, 39]. This discrepancy may stem from the inherent challenge of MRI images in differentiating various PCa pathologies. Considering the potential of PET-derived radiomics as biomarkers for predicting treatment outcomes and characterizing tumor biology in a non-invasive manner is noteworthy [41]. Specifically, radiomic features derived from [68Ga]Ga-PSMA-11 PET/CT images have shown remarkable proficiency in discerning Gleason scores [24]. Our study presents the pioneering application of [68Ga]Ga-PSMA PET radiomics in selecting patients for AS. Encouragingly, our results demonstrate that the radiomics model based on PSMA PET imaging outperforms clinical models in terms of diagnostic performance in both the training and testing sets, confirming the ability of [68Ga]Ga-PSMA PET to identify adverse postoperative pathology in patients with AS.

Advances in technology have revolutionized the management of PCa, with mounting evidence supporting the adoption of sophisticated tests and comprehensive features to individualize patient assessment and ensure optimal treatment [42]. The proposed radiomics-based analysis incorporating the clinical-radiographic feature could provide a noninvasive biomarker for the individualized and precise medical treatment of patients [40]. In our study, we developed a predictive model incorporating PSMA PET imaging, histology features, and FPSA/TPSA, which exhibited superior diagnostic performance compared to both clinical and imaging histology models, with consistent results in the test set. This underscores the complementary nature of clinical and imaging histology features and the increased robustness achieved through their combination. Current major clinical guidelines, including the European Association of Urology (EAU) guideline, recommend active surveillance as the treatment of choice for patients with low-risk prostate cancer. In our internal validation queue, the use of our combined model to select AS candidates would allow for an 83.3% increase in the number of patients eligible for AS without increasing the risk of adverse pathological characteristics compared to the EAU criteria. Our combined model also showed excellent calibration characteristics at internal validation. Notably, among patients with a predicted risk of AP greater than 40%, the model would underestimate the actual risk of AP. Some individuals with AP might receive an AS regimen. Among patients with a predicted risk of AP less than 40%, the model would overestimate the actual risk of AP. Some individuals without AP might excluded from an AS regimen. Furthermore, we employed nomogram plots to enhance the visualization and clinical utility of the model, providing a clear representation of the impact of each factor on the target event for individual patients. In clinical practice, these nomogram plots can facilitate the scoring of patients based on their clinical and imaging histology features, enabling the assessment of their probability of harboring adverse pathology. As a result, they provide guidance to clinicians in selecting appropriate AS patients. Radiomics features are arguably more dependent on the underlying image data. A recent meta-analysis suggests that [68Ga]Ga-PSMA and [18F]F-DCFPyL PET have comparable diagnostic performance in patients with suspected prostate cancer [43]. A study including 160 men found that the SUVmax of [18F]F-PSMA and [68Ga]Ga-PSMA did not differ (P > 0.05) in local recurrence or primary prostate cancer [44]. These results seem to indicate that there is no difference in the uptake of [18F]- and [68Ga]-labeled PSMA ligands in prostate cancer PET scans. However, the [68Ga]-labeled PSMA ligand used in these studies is [68Ga]Ga PSMA-11, and there is limited data on [68Ga]Ga-PSMA-617. In addition, there are no studies that have investigated whether the value of PET-based radiomics features is valid for different PSMA ligands. Further multicentre, large-scale studies will be required to establish with certainty the accuracy and wider applicability of the radiomic signature proposed here.

Several limitations are evident in this study. Firstly, the sample size was relatively small due to strict inclusion criteria. And external validation is lacking in the present study, which may restrict the generalizability of our results. Future validation would benefit from additional multicenter, large-scale studies. Secondly, the study's endpoint of AP at RP is a surrogate outcome for cancer-specific survival in AS patients. However, this limitation is common in most studies as intermediate-risk patients are usually offered active treatment. Finally, this study did not incorporate MRI-related visual assessments and imaging histology features because the MRI examinations for most patients were conducted at external hospitals, which could result in substantial differences between MRI image acquisition and interpretation.

Conclusions

In conclusion, we have developed the first model based on the PSMA PET-derived radiomics and clinical features in identifying candidates for AS, which has the potential to aid in the safe selection of Gleason Grade Group 1–2 patients for AS to increase the absolute proportion of men eligible for AS and decrease their risk of overtreatment.