Introduction

Triple-negative breast cancer (TNBC) is defined as negative for estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 (HER2)1,2. TNBC is an aggressive cancer typically treated with neoadjuvant systemic therapy (NAST)3. Among patients with TNBC treated with NAST, those with a pathologic complete response (pCR) have better clinical outcomes4,5,6. Thus, noninvasive prediction of pCR status early in the course of NAST is of high importance. Patients likely to have incomplete or no response (a non-pCR) to NAST can be triaged to clinical trials of novel therapies7,8, and patients likely to achieve pCR can continue standard-of-care treatment or qualify for investigational treatment de-escalation6,9.

The most accurate diagnostic imaging modality for baseline staging of breast cancer and assessment of breast cancer response to NAST is dynamic contrast-enhanced MRI (DCE MRI)10,11,12. Diffusion-weighted MRI (DWI) with quantitative apparent diffusion coefficient (ADC) complements DCE MRI and can provide additional functional information13,14.

Radiomic features from DCE MRI and DWI images, i.e., quantitative features that describe characteristics of tumors and their surroundings and are extracted from MR images by use of computer algorithms15, can help predict the response of breast cancer to treatment15. DCE MRI radiomic features, such as Haralick gray-level co-occurrence matrix (GLCM)-based features, characterize tumor enhancement heterogeneity, which has been shown to correlate with aggressive growth and poor prognosis of breast cancer and can be helpful for noninvasive prediction of breast cancer response to treatment16,17,18. Several studies have also shown that DWI radiomic features can be used for breast lesion diagnosis18, to distinguish between benign and malignant breast lesions19, to differentiate TNBC from other breast cancer subtypes20, and potentially for predicting breast cancer response to treatment14,21.

As DCE MRI and DWI are the most utilized MRI diagnostic sequences, there is interest in investigating the combination of DCE MRI and DWI radiomic features to improve assessment of tumor biology before and during treatment to better guide treatment22. Functional MRI techniques included in the multiparametric imaging of breast tumors provide more complete pathophysiologic information about breast cancer23 that can improve diagnostic accuracy and assessment of response to NAST24.

The purpose of our study was to determine if radiomic signatures based on multiparamteric DCE MRI and DWI images obtained early during NAST can predict pCR in patients with TNBC.

Results

Patient characteristics and pCR status

The study included 163 patients with TNBC who underwent NAST and had MRI at baseline (BL) and after 2 cycles (C2) and after 4 cycles (C4) of NAST with an MRI protocol consisting of an axial T2-weighted sequence, a DCE MRI sequence, and a DWI sequence. Seventy-eight patients (48%) had pCR, and 85 (52%) had non-pCR. Data from 109 patients were used as the training set, and data from the other 54 patients were used as the testing set, following a 2:1 training to testing sample size ratio. The training set included 52 (32%) patients with pCR and 57 (35%) patients with non-pCR and the testing set included 26 (16%) patients with pCR and 28 (17%) patients with non-pCR. Patient characteristics are summarized in Table 1

Table 1 Characteristics of patients with TNBC who received NAST by pCR status*

Imaging features associated with pCR status by univariate analysis

Out of 10 first-order radiomic features and 300 GLCM radiomic features, univariate analysis identified 131 imaging features from both DCE MRI and DWI that predicted pCR status with area under the receiver operating characteristics curve (AUC) ≥ 0.7 in both the training and testing sets. Specifically, 25 first-order radiomic (histogram) features from the DCE images (AUC 0.70–0.85 for the training set and 0.70–0.81 for the testing set; Supplementary Table 1) and 106 GLCM features from the DWI images (AUC 0.70–0.79 for the training set and 0.70–0.81 for the testing set; Supplementary Table 2) at both the C2 and C4 time points and changes between C4 and BL, C4 and C2, and C2 and BL were statistically significant with p < 0.001. Radiomic features from BL DCE MRI and DWI images showed similar yet consistently worse performance in univariate analysis; all AUCs were less than 0.63 for DCE MRI and less than 0.65 for DWI (data not shown).

Performance of the radiomic models by multivariate analysis

Thirty-six models from multivariate analysis based on both DCE MRI and DWI had AUC > 0.7 in both the training and testing sets (p < 0.003), and 18 of these models had AUC > 0.75 in both the training and testing sets (p < 0.001). The 3 top-performing models combined radiomic features corresponding to relative difference between BL and C2, absolute difference between BL and C4, and C4 features from DCE MRI and DWI images and all had AUC > 0.79 in both the training and testing sets (Table 2, Fig. 1). The best model combined 35 radiomic features corresponding to relative difference between BL and C2 and had AUC = 0.91 in the training set and AUC = 0.80 in the testing set (p < 0.001). The second-best model combined 44 radiomic features of absolute difference between BL and C4 and had AUC = 0.90 in the training set and AUC = 0.80 in the testing set (p < 0.001). The third-best model combined 18 radiomic features from C4 and the absolute difference between BL and C4 and had AUC = 0.82 in the training set and AUC = 0.79 in the testing set (p < 0.001).

Table 2 Top-performing multivariate models for pCR prediction.
Figure 1
figure 1

ROC curves of the three best-performing radiomic models for pCR prediction. Testing cohort using logistic regression with elastic net (black solid line). Entire cohort using threefold cross validation logistic regression with elastic net (blue dashed line), and threefold cross validation support vector machine (SVM) with linear kernel (red dotted line). FPR, false positive rate.

The threefold cross-validation without stratified splitting had similar and comparable AUC to training/testing-based AUC. The first model had an AUC of 0.75, the second model had an AUC of 0.78, and the third model had an AUC of 0.76 (Fig. 1). As a comparison to the elastic net-based logistic regression, support vector machine (SVM) with a linear, radial basis function (RBF), or Gaussian kernel, yielded AUC of <  = 0.72 for all the 3 models. Thus, the logistic model had a better performance than SVM (Supplementary Table 4).

Inter-reader and intrareader agreement

Pearson correlation analysis for features extracted from DCE images showed that both GLCM features and first-order features had excellent correlation between readers and that GLCM features had better correlations than first-order features. The Pearson correlation coefficient was > 0.8 for 25 of 30 (83%) first-order features (7 BL, 10 C2, and 8 C4), 18 of 30 (60%) absolute difference first-order features (6 C2/BL, 7 C4/BL, and 5 C4/C2), 20 of 30 (67%) relative difference first-order features (8 C2/BL, 8 C4/BL, and 4 C4/C2), 300 of 300 (100%) GLCM features, 300 of 300 (100%) absolute difference GLCM features, and 271 of 300 (90.3%) relative difference GLCM features. As for the ratio of inter-reader variance to intrareader variance for the first-order features, the mean value was 0.006, the median value was 0, the range was 0 to 0.077, and the standard deviation was 0.016, indicating high agreement between the 2 readers. The ratios of inter-reader variance to intrareader variance for all of the GLCM features were zeros rounding to the nearest ten thousandth, indicating little variability between the 2 readers for any of the GLCM features. Overall, inter-reader variability was only minimally higher than intrareader variability. Furthermore, pCR prediction models extracted from DCE images showed similar AUC values for the 2 readers (Supplementary Table 3).

Discussion

In this study, we investigated the performance of radiomic signatures derived from multiparametric MRI images obtained early during NAST for prediction of treatment response in patients with TNBC. Top performance was achieved by models that included radiomic features from DCE MRI and DWI images acquired at C4, changes between BL and C2, and changes between BL and C4. These models had high inter-reader agreement and therefore could potentially be used as a quantitative tool to predict the pCR to NAST.

Among patients with TNBC, those with a pCR to NAST have a much better prognosis than those with a non-pCR. A method to accurately predict the response to NAST early in the treatment course would allow patients predicted to have a pCR to continue the treatment or have treatment de-escalation and patients predicted to have a non-pCR to be switched to different therapy or offered novel targeted trials.

Results from several studies showed that radiomic features may be useful for breast tumor detection and for prediction of treatment response, tumor molecular subtype, and axillary lymph node metastases in patients with breast cancer25,26,27,28,29,30,31. Combined application of different imaging sequences was found to be superior to use of a single sequence in prediction of pCR after NAST32,33,34.

A study by Chen et al. included 91 patients with different breast cancer molecular subtypes and 396 radiomic texture features from multiparametric MRI. The models based on pretreatment DCE MRI and ADC data were able to predict pCR with greater accuracy than the models based on either DCE MRI or ADC alone33. **. Estrogen receptor and progesterone receptor status were evaluated using immunohistochemical staining, and tumors were defined as negative for these receptors if fewer than 10% of invasive tumor cells exhibited positive nuclear staining for these receptors. HER2 was defined as negative per the American Society of Clinical Oncology/College of American Pathologists guidelines37. Upon completion of NAST, patients underwent surgical resection with assessment of residual disease by dedicated breast pathologists. pCR was defined as absence of residual invasive disease in the breast and the axillary lymph nodes.

Neoadjuvant systemic therapy

NAST consisted of dose-dense AC (doxorubicin and cyclophosphamide) for 4 cycles followed by paclitaxel every 2 weeks for 4 cycles or weekly for 12 doses. Patients with suboptimal responses who received experimental treatment were excluded to ensure homogeneity of the cohort.

MRI data acquisition

Breast MRI was performed with patients in the prone position using a GE 3.0-T MR750w whole body scanner (Waukesha, WI) with an 8-channel phased array bilateral breast coil. The MRI protocol consisted of axial T2-weighted series, DCE MRI series, and DWI series. The DCE series was performed using the Differential Subsampling with Cartesian Ordering (DISCO) sequence with bipolar readouts for 2-point Dixon processing to produce water-only and fat-only images for each acquired slice.

Typical scan parameters used for the DISCO acquisition were as follows: field of view = 34 × 34 cm, flip angle = 12º, repetition time = 7.6 ms, echo time 1/echo time 2 = 1.1/2.3 ms, total acquisition time = 7 min, slice thickness = 3.0 mm, slice spacing = -1.5 mm, number of acquired slices = 60–115, matrix = 320 × 320, temporal resolution = 8–15.5 s, receiver bandwidth =  ± 166.7 kHz, number of excitations = 0.69, and autocalibrating reconstruction for Cartesian imaging factor = 3. During DCE series, a single bolus of gadobutrol contrast agent (Gadovist, Bayer Health Care) was injected (0.1 mL/kg at ~ 2 mL/second followed by saline flush) after at least 1 mask phase was obtained. Serial subtraction images were generated during postprocessing. Early subtraction images relative to the mask series were generated at approximately 2.5 min after injection.

DWI images were acquired prior to DCE MRI with a reduced field-of-view sequence (FOCUS DWI). Compared to conventional DWI, FOCUS DWI allows a shorter echo train for a desired resolution, with reduced image blurring and reduced artifacts. Typical scan parameters for FOCUS DWI were as follows: TE/TR = 70/4000 ms, matrix size = 80 × 80, field of view = 16 × 16 cm2, number of slices = 13–16, slice thickness = 4 mm, slice gap = 0 mm, scan duration = 5 min. The b-values and corresponding numbers of signal averages were 100 (4) and either 800 (16) or 1000 (16) sec/mm2.

Image preprocessing, tumor segmentation, and volume extraction

For tumor segmentation, the phase at 2.5 min after injection of contrast medium from the DISCO fast DCE MRI acquisition was selected as an early phase for segmentation. ADC maps were generated from the acquired FOCUS DWI images using a mono-exponential model. If there were multiple lesions, the largest lesion identified on DCE images was considered the index carcinoma. Tumor contouring was performed by 2 breast radiologists with 8 years (M.B.) and 5 years (R.M.) of experience who were blinded to the patient clinical outcomes and in the order of trial enrollment, on the BL, C2, and C4 images using an in-house image analysis software program (Image-I)38,39. Tumors were first manually defined across all slices on the 2.5-min early subtraction images and b = 800 DWI images, and all volumes of interest (VOIs) were semi-automatically segmented using the histogram thresholding in Image-I. Tumor necrotic regions and clip artifacts were segmented out of the VOIs to ensure proper phenotype border selection. If tumors were not visible on the C2 scan or the C4 scan, the tumor bed was contoured (Fig. 3). The inter-reader agreement was analyzed between the VOIs of DCE maps obtained from the 2 breast radiologists (M.B. and R.M.) following the same procedure.

Figure 3
figure 3

Tumor contouring. (A) Fifty-five-year-old woman with triple-negative invasive ductal carcinoma of the right breast. DCE, DWI and ADC images show a segmented irregular mass at 7 o’clock (arrows) that measured 2.1 × 1.9 × 1.4 cm at BL, 1.7 × 1.5 × 1.2 cm at C2, and 1.5 × 1.5 × 1 cm at C4. Histopathologic assessment at surgery revealed non-pCR. (B) Fifty-one-year-old woman with triple-negative invasive ductal carcinoma of the left breast. DCE, DWI and ADC images show a segmented irregular mass at 2 o’clock (arrows) that measured 3 × 2.1 × 2.1 cm at BL and had decreased in size to 1.3 × 1.2 × 0.9 cm at C2. At C4, there was complete resolution of the mass and tumor bed was segmented (arrows). Histopathologic assessment at surgery revealed pCR.

Radiomic analysis

Radiomic features were extracted using an in-house software package written in MATLAB (MathWorks Inc, Natick, MA). Within the segmented VOI, 10 first-order radiomic features (minimum, maximum, mean, standard deviation, kurtosis, skewness, and first, fifth, 95th, and 99th percentiles) and 300 GLCM radiomic features (histogram features and texture features) were extracted from the VOIs of the 2.5-min early subtraction images and b = 800 DWI images. Figure 4 shows the radiomic analysis pipeline.

Figure 4
figure 4

Radiomics prediction pipeline for pCR. (A) Segmentation process of DCE and DWI images. (B) Different radiomic features extraction. (C) Statistical analysis to search for the best model for pCR prediction.

Statistical analysis

Patient clinicopathologic characteristics were summarized using frequencies, percentages, means, standard deviations, medians, minimums, and maximums. The demographic and clinical characteristics were compared between patients with pCR and non-pCR using Wilcoxon rank sum test or Fisher’s exact test.

AUC was used for univariate analysis in evaluating the performance of predicting pCR status. Logistic regression with elastic net regularization was performed for radiomic feature selection. Both regularization and mixing parameters were optimized using fivefold cross-validation based on AUC. Stratified sampling was applied to divide subjects into training and testing data. The performance of the training/testing-based logistic regression with the elastic net penalty was compared with threefold cross-validation logistic regression model with the elastic net penalty and SVM with the linear, radial basis function (RBF), or Gaussian kernel classifier.

P-value less than 0.05 was considered statistically significant. For inter-reader agreement, the Pearson correlation was computed to evaluate the linear relationship of each imaging feature between 2 readers. Wilcoxon signed-rank test was applied to test the difference between 2 readers per feature. The ratios of inter-reader to intrareader variability were calculated per feature and summarized by range, mean, and median. The statistical analysis was performed by using R software (version 4.0.3, R Foundation for Statistical Computing, Vienna, Austria) and the packages glmnet and pROC (version 1.9.1).