Introduction

Neoadjuvant chemotherapy (NAC) is a standard of care for locally advanced and inflammatory breast cancers and has become the major approach in treating early-stage triple-negative breast cancers (TNBC) and human epidermal growth factor receptor 2 (HER2)-overexpressing breast cancers1. The purpose of NAC is to reduce the size of the tumor, thereby facilitating the option of breast-conserving surgery. Furthermore, the response to NAC can serve as an important biomarker for determining long-term prognosis, especially for HER2-overexpressing breast cancers and TNBC2,3.

Meanwhile, responses to NAC have shown considerable variability, with a wide range of 0.3–50.3% of pathologic complete response (pCR) rates4,5 and approximately 20% of disease progression depending on the cancer subtype5,6. Currently, breast MRI stands out as the most sensitive tool in evaluating tumor response to NAC7,8, although there are no standardized criteria established for monitoring response on MRI except for tumor size, defined by the Response Evaluation Criteria in Solid Tumors (RECIST)9. Assessing clinical response is crucial for potentially altering treatment plans for patients who exhibit chemoresistance, to mitigate unnecessary toxic therapy and patient burden, and even for initial planning10.

TNBC is a biologically aggressive subtype that is characterized by the lack of estrogen receptor, progesterone receptor, and HER2. TNBC can be sub-classified into at least six molecular subtypes, each with distinct clinical profiles and chemo-responsiveness, which may pose a challenge in predicting prognostic outcomes and facilitating prompt clinical therapeutic decisions11. Many studies have explored a range of breast MRI parameters to predict the response to NAC of TNBC12,13,14. Li et al.12 reported that tumor volume, time to peak enhancement, and androgen receptor status were significant independent predictors of pCR. Two studies evaluated kinetic information from dynamic contrast-enhanced (DCE) MRI to predict the pathological tumor response13,14. However, a prior meta-analysis highlighted that pretreatment MR parameters faced challenges in distinguishing NAC responders from non-responders, with variations often attributed to inconsistent methodologies15. In addition, most studies anchor their findings to post-surgical pathological evaluations such as pCR or residual cancer burden. Although pathological and clinical responses in TNBC are correlated, they are not identical16,17. The clinical response rate in TNBC is about 20–40%, with a pCR rate of 30–50%17,18. Despite improved treatments such as TNBC-targeted therapy or immunotherapy, there are still patients who exhibit resistance to NAC. Therefore, investigating alternative non-invasive imaging biomarkers is essential to enhance the identification of TNBC patients unlikely to benefit from NAC.

The recent explosive growth in radiomics work has proven its role in various fields of analyzing medical images, involving extracting and analyzing multiple quantitative features using advanced computational techniques. In recent years, several studies have been conducted to assess the role of quantitative radiomics features extracted from MR images in predicting prognosis and treatment response in patients with breast cancer19,20,21. However, a significant research gap persists in the early identification of TNBC non-responders to NAC based on RECIST criteria. Therefore, the purpose of our study aimed to evaluate the diagnostic performance of pretreatment MRI, using both qualitative morphologic features and quantitative radiomics features paired with machine learning algorithms, to predict non-responders to NAC among patients with TNBC.

Results

Characteristics associated with non-responder group

This study included 135 women (mean age ± standard deviation, 51.5 ± 10.5 years). Of 135 patients, 107 (79.3%) were categorized as responders, and 28 (20.7%) were categorized as non-responders. In the responder group, there are 32 cases (29.9%) of CR and 75 cases (70.1%) of PR. In the non-responder group, there are 12 cases (42.9%) of SD and 16 cases (57.1%) of PD. The median time between baseline MRI and the start of NAC was 23.7 days (range, 3–152 days). Of the 28 non-responders, two (7.1%) depicted progressive disease (PD) at mid-treatment MRI and exacerbated despite the second-line NAC regimen. Eleven (39.3%) patients showed stable disease (SD) after mid-treatment MRI, with no alterations in the NAC regimen, and subsequently, eight sustained SD and three manifested PD after NAC completion. Fifteen (53.6%) patients displayed partial response (PR) at mid-treatment MRI but showed PD or SD on follow-up.

Among 105 responders who underwent surgery, 32 (30.5%) patients attained pCR, with 73 (69.5%) manifesting non-pCR after surgery. None of 24 non-responders who underwent surgery achieved pCR (p = 0.001; Table 1).

Table 1 Baseline clinicopathologic factors of responder and non-responder groups.

The TE regimen was predominantly utilized in the non-responder group (p = 0.02), and high-grade tumors were more prevalent in the same group (p = 0.007). A higher frequency of mastectomy over breast-conserving surgery occurred in the non-responder group (p = 0.011). The pre-NAC clinical stage displayed no significant difference between the groups (Table 1).

In clinical MRI findings, tumors in the non-responder group showed less rim enhancement compared to the responder group (p = 0.009), with no other significant differences in clinical MRI findings noted between the groups (Table 2).

Table 2 Clinical magnetic resonance imaging findings of responder and non-responder groups.

Interobserver reproducibility

The ICC between reader 1 (R1) and reader 2 (R2) for ADC values was 0.758 (95% CI, 0.573–0.862), indicating moderate agreement. The agreement for intratumoral T2 hyperintensity and peritumoral edema between R1 and R2 was moderate (k = 0.529 and 0.539, respectively). Consequently, we generated consensus readings of these clinical MRI findings as training data for the models. The ICCs (mean ± standard deviation) for radiomics features extracted from intra- and peritumor-ROIs between R1 and reader 3 (R3) were 0.968 ± 0.036 (range, 0.878–0.999) and 0.958 ± 0.053 (range, 0.830–0.998), signaling excellent agreement. Therefore, the radiomics features from R1 were used to train all radiomics-based models.

Diagnostic performance of clinicopathologic factors and clinical MRI findings

An evaluative approach was employed to determine the significance of features within models, integrating clinicopathological factors and clinical MRI findings. Within the spectrum of clinicopathological factors, the histologic grade and NAC regimen were identified as pivotal determinants, influencing the outcome of the models (Supplementary Fig. S1). Within the realm of clinical MRI findings, attributes such as mass internal enhancement and the proportion of kinetic components were discerned as crucial elements. The model based on clinicopathologic factors achieved an AUROC of 0.697 (95% CI 0.579–0.815), whereas the model based on clinical MRI findings yielded an AUROC of 0.706 (95% CI 0.578–0.834) (Fig. 1). The combined model, utilizing both factors, achieved a higher AUROC of 0.752 (95% CI 0.644–0.860), underscoring its enhanced predictive capability, despite the statistical differences with the models using clinicopathologic factors or clinical MRI findings alone not reaching significance (p = 0.08 and p = 0.33, respectively; Fig. 1).

Fig. 1
figure 1

Receiver operating characteristic (ROC) curves and bar plots depicting the performances of clinicopathologic factors, clinical MRI findings, their combination (a), and radiomics features extracted from different regions of interest (ROIs): Intratumor-ROI, peritumor-ROI, and combined-ROI (b).

Diagnostic performance of radiomics models

The peritumor-ROI model revealed an AUROC of 0.739 (95% CI 0.601–0.877), compared to the intratumor-ROI model which yielded an AUROC of 0.713 (95% CI 0.548–0.878) (Fig. 1). The combined-ROI model yielded an AUROC of 0.749 (95% CI 0.614–0.884). The associated p values exceeded 0.05, suggesting a lack of statistical significance in the variations observed amongst the radiomics models. Additionally, no significant difference was observed between the combined-ROI model and the model combining clinicopathologic factors and clinical MRI findings (p = 0.21).

Exploring models developed on radiomics features, the “SurfaceArea” emerged as a paramount feature within the shape category (Supplementary Fig. S1). Concurrently, "Skewness” was identified within the first-order category, manifested equivalent prominence.

Diagnostic performance of a comprehensive integrated model

The integrated model that included radiomics features, clinicopathologic factors, and clinical MRI findings exhibited an AUROC of 0.802 (95% CI 0.905–0.699; Fig. 2). This integrated model’s metrics are elucidated in Table 3, which signifies its superiority in sensitivity of 0.881 and negative predictive value of 0.948. However, comparisons between predictive models did not yield statistically significant difference (all, p > 0.05).

Fig. 2
figure 2

ROC curve for the integrated model of clinicopathologic factors, clinical MRI findings, and radiomics features.

Table 3 Performance estimation of prediction models based on data types.

Diagnostic performance of the prediction models after propensity score matching

A selection of twenty-eight patients from the non-responder group were matched with forty-four from the responder group, consolidating a total of seventy-two patients. Confounding clinicopathologic factors were matched between the groups (Supplementary Table S1). After matching, a significant difference in mass internal enhancement was observed between the groups (p = 0.009; Supplementary Table S2). The hierarchical order of the top 10 radiomics features remained consistent following the propensity score matching, indicating the selective utility of a minimal subset of radiomic features in the analytical model, thereby emphasizing the significance of precise feature selection in enhancing model reliability and efficacy (Supplementary Fig. S2). The model focused on clinical MRI findings yielded an AUROC of 0.681 (95% CI 0.543–0.818), and the radiomics models based combined-ROI exhibited an AUROC of 0.728 (95% CI 0.608–0.847; Supplementary Fig. S3). These variations in AUROC did not represent statistically significant distinctions compared to models employing clinical MRI findings solely or within each radiomics model (all, p > 0.05).

Discussion

This study demonstrated the feasibility of pretreatment MRI-based machine learning models in predicting non-responders to NAC among patients with TNBC. The predictive model that combines clinicopathological factors and clinical MRI findings, yielded an AUROC of 0.752. A model using radiomics features based on combined-ROI yielded an AUROC of 0.749. Although no statistically significant difference was revealed, the AUROC of the integrated model encompassing clinicopathologic factors, clinical MRI findings, and radiomics features was 0.802, which included a vast array of quantitative imaging features that potentially capture subtle patterns undetectable in routine clinical assessment and might offer a more comprehensive analysis than models based on singular data types. This approach aligns with previous literatures in which combining radiomics analysis with conventional clinical factors, representing traditional expert labeling, outperformed models trained solely with either radiomics or clinical factors22,23.

Currently, there are no standard imaging criteria for assessing the responses to NAC. However, several studies have reported the reliability of MRI in evaluating responses to NAC in breast cancer24,25, and more recently, developments in the quantitative radiomics analysis of medical images have shown promising results in improving MRI accuracy for response prediction26,27,28. Sie and Huang et al.29 developed a quantitative measure of intratumoral heterogeneity by using radiomics on DCE-MRI for predicting pCR in patients with breast cancer. Two studies have demonstrated that radiomics analysis utilizing multiparametric MR sequences outperformed the diagnostic performance based on a single sequence in predicting the response to NAC30,31. In this study, our model aimed to predict non-responders based on clinical imaging assessments as the ground truth. Non-responders are typically excluded from undergoing surgery and often fail to achieve critical outcomes like pCR, offering a different study perspective compared to previous studies that focus on pCR as their primary endpoint.

We preferentially employed DCE-MRI for radiomics analysis, as it is currently considered the most reliable method for assessing NAC responses. DCE-MRI’s ability to estimate lesion conspicuity and measure perfusion and vascular permeability makes it a valuable tool for indicating angiogenic changes14,32. Among the radiomics features, “SurfaceArea” was identified as the most pertinent feature33. Although establishing a direct correlation between a singular radiomics feature and tumor response can be intricate, a larger “SurfaceArea” is suggestive of the presence of multiple tumor subregions with varied sensitivity to chemotherapy34,35. The “Skewness” from the first-order category also held significant relevance, offering insights into intratumoral heterogeneity36. These findings advocate for the inclusion of detailed radiomic analysis in the diagnostic process, offering a nuanced perspective on tumor behavior and chemotherapy susceptibility.

In this study, we proceeded with a radiomics analysis utilizing both intra- and peritumor-ROIs and examined the performance of a radiomics model that incorporated both intra- and peritumor-ROIs for comparing model performances. The peritumoral region is known to undergo substantial biological changes, reflecting the angiogenic, lymphangiogenic activities, and immune infiltration37,38. This has been validated by studies like that of Braman et al., who demonstrated the usefulness of radiomics features from peritumoral region to be associated with chemotherapy response28,38,39. However, the definition of the peritumor-ROI, which we established as a uniform 3-mm extension from the tumor boundary in this study, could have influenced the lack of statistically significant differences between models. Therefore, it is essential to conduct additional research to carefully define the peritumoral region and assess its impact on the radiomics approach.

In a subgroup analysis, a propensity score matching of clinicopathologic factors was performed to evaluate the predictive performance of radiologic indicators solely. Within the clinical MRI findings considered in this study, the absence of rim enhancement emerged as the primary predictor of non-responders to NAC, both before and after propensity score matching. This attribute is a reflection of the blood flow dynamics within the tumor, and the prominence of rim enhancement can be indicative of rapid tumor growth and intertumoral necrosis40. Consequently, tumors exhibiting high histologic grades and rapid cellular proliferation are conjectured to be more responsive to chemotherapy. Although we did not evaluate the association between rim enhancement on breast MRI and radiomics features, one study reported that high positive “Skewness” could mean the lower mean signal intensity in contrast-enhanced T1 weighted image which may represent intratumoral necrosis41. Further research warrant consideration to quantify the underlying biologic characteristics of rim enhancement and assess its association with treatment response in breast cancer.

This study has several limitations. First, we did not use volume measurements or functional tumor volume to estimate tumor size, despite their known association with recurrence-free survival42, which is based on the lack of established standardization for enhancement or functional volumetric measurements by RECIST criteria. Second, we did not perform imaging analysis acquired at different timepoints at each chemotherapy cycle. Third, we only used the first contrast-enhanced T1-weighted image for radiomics analysis, although the utility of diffusion-weighted imaging or T2-weighted sequences as pretreatment predictors of response has been reported43,44. Future research should employ standardized MR protocols within the context of multiparametric MRI for predicting non-responders. Fourth, this is a single-center retrospective study with a relatively small number of non-responders, making selection bias unavoidable. Although we performed propensity-score matching to eliminate the impact of confounding factors, not all potential confounding factors were thoroughly matched. Moreover, the regimen of NAC was heterogeneous and inconsistent. However, our study reflects the real-world clinical scenario with diverse treatment regimens and a limited number of non-responder events among TNBC patients. Lastly, we did not perform an external validation test, relying instead on a five-fold cross-validation approach to mitigate overfitting risks, emphasizing the need for future studies with multicenter prospective data to validate the proposed methodologies.

In conclusion, this study demonstrates the potential of machine learning models using pretreatment MRI-based clinical and radiomics features in predicting non-responders to NAC among TNBC patients. Integrating clinicopathologic factors, clinical MRI findings, and radiomics augments diagnostic capabilities, paving the way for more personalized and effective cancer management strategies.

Methods

Study population and clinicopathologic factors

This retrospective study was approved by the Institutional Review Board (IRB) of Chonnam National University Hwasun Hospital (IRB No. H2023-0322), which waived the requirement for written informed consent. Our research was carried out in accordance with the Declaration of Helsinki of 1975, revised in 2013.

Among 651 consecutive patients with primary invasive breast cancer who received NAC between January 2015 and October 2022, 142 had biopsy-proven TNBC. Exclusion criteria involved patients: (1) with initial M1 stage (n = 3), (2) with MR images obtained via 1.5 T MRI (n = 1), (3) who underwent excision or vacuum-assisted biopsy for primary cancer (n = 1), (4) where treatment was deferred for over six months or failure to meet the minimum protocol time from baseline (n = 1), and (5) with unavailable imaging data during processing (n = 1) (Fig. 3).

Fig. 3
figure 3

Flowchart of the study population.

All patients received standard NAC treatment (paclitaxel/docetaxel and epirubicin [TE], doxorubicin and cyclophosphamide followed or preceded by a taxane [AC \(\pm\) T], weekly paclitaxel and carboplatin followed by anthracycline and cyclophosphamide with concurrent programmed cell death protein 1-targeted monoclonal antibody [wPC], fluorouracil and adriamycin and Cytoxan [FAC]) at our institution in adherence to the National Comprehensive Cancer Network (NCCN) guidelines45. Before initiating NAC, patients underwent conventional breast imaging modalities and clinical breast examinations. Furthermore, breast MRIs and clinical breast assessments were conducted periodically at mid-treatment to assess the clinical efficacy of the treatment before surgery. Surgical intervention was contemplated post-NAC completion or when deemed non-beneficial. In instances where the cancer advanced to an unresectable stage due to disease progression, alternative therapeutic strategies were deliberated upon by a multidisciplinary tumor board at our institution. Any adjuvant treatments were administered in alignment with NCCN guidelines45.

Data on clinicopathological factors, including age, NAC regimen, histologic type, histologic grade, Ki-67 proliferation rate, clinical T and N stages, and surgical method, were collected from all available electronic medical records. Prior to NAC, patients underwent a core-needle biopsy to ascertain tumor histology, hormone receptor status, Ki-67 proliferation rate, and HER2 overexpression status. TNBC was defined as less than 1% positivity for estrogen and progesterone receptors and a score of 0 or 1 + of HER2 via immunohistochemical staining or the absence of gene amplification through fluorescence or silver in situ hybridization (HER2/CEP17 ratio < 2.0)46,47. We also investigated the pathological stage and pCR rate in order to evaluate the correlation between the clinical response and the pathological response in patients who underwent surgical treatment. pCR was defined as no residual invasive cancer in the breast or lymph nodes after surgery. Clinical and pathological stages were determined using the 8th edition of the American Joint Committee on Cancer tumor-node-metastasis staging system48.

Imaging protocol

All pre- and post-NAC breast MR examinations were conducted using 3 T scanners (Tim trio, Skyra, Skyra II; Siemens Healthcare, Erlangen, Germany) with a dedicated breast coil incorporating a minimum of four channels. These examinations comprised axial fat-saturated turbo spin-echo T2-weighted images with the following parameters: repetition time (TR)/echo time (TE) of 3840/70 ms, field-of-view (FOV) measuring 340 × 340 mm2, matrix size of 384 × 286 mm, and slice thickness of 2 mm. Additionally, axial fat-saturated spoiled gradient-echo dynamic contrast-enhanced (DCE) T1-weighted images were acquired using TR/TE of 4.5/1.7 ms, FOV of 320 × 320 mm2, matrix size of 448 × 358 mm, and slice thickness of 1.5 mm. The DCE-MRI protocol included one unenhanced and five contrast-enhanced series and was performed using intravenous administration of gadoterate meglumine (Dotarem; Guerbet, Aulnay-sous-bois, France) at a dosage of 0.1 mmol/kg body weight. For diffusion-weighted imaging (DWI), a TR/TE of 6430/58 ms, FOV of 160 × 340 mm2, matrix size of 78 × 166 mm, in-plane resolution of 2 × 2 mm2, slice thickness of 4 mm, and an acquisition time of 4 min 1 s were employed. DWI employed a readout-segmented echo-planar imaging sequence with spectrally adiabatic inversion recovery and parallel imaging preceding the DCE-MRI sequence. An apparent diffusion coefficient (ADC) map was generated, applying diffusion gradients along the x-, y-, and z-axes, based on two b values, 0 and 800 s/mm2.

Assessment utilizing a commercially available computer-aided detection (CAD) system (CADstream, version 6.0; Confirma, Kirkland, WA, USA) facilitated the meticulous examination of contrast enhancement kinetics. Both unenhanced and contrast-enhanced series were integrated into the CAD system, and an automated region-of-interest (ROI) was situated on the lesion, abiding by the enhancement threshold delineated for MRI evaluations provided by the CAD system.

Assessment of treatment response

The clinical response was determined using DCE-MRI. Tumor diameter was measured as the longest diameter of enhancing solid lesions in the largest of the three planes (x, y, z-axes). Within the responder category, patients demonstrating either complete response (CR) or partial response (PR) were incorporated. In contrast, non-responders were those manifesting stable disease (SD) or progressive disease (PD), despite the mid-treatment modification of the NAC regimen, signifying the development of chemotherapy resistance. When multiple tumors were present, the maximal tumor diameters of the two largest lesions were combined for assessment. Two representative samples of non-responder and responder are presented in Fig. 4.

Fig. 4
figure 4

Two representative samples of non-responder and responder. Axial fat-saturated, contrast-enhanced MR images before (a) and after (b) neoadjuvant chemotherapy demonstrate disease progression in an irregular, heterogeneously enhancing mass. The predicted rates of non-response according to clinicopathologic + MRI model, radiomics model, and integrated model were 86.4%, 91.9%, and 85.9%, respectively. Surgical specimen image (hematoxylin and eosin (H&E) staining, (c) demonstrates extensive residual cancer burden following neoadjuvant chemotherapy. Axial fat-saturated, contrast-enhanced MR images before (d) and after (e) neoadjuvant chemotherapy demonstrate a radiological complete response in an oval rim-enhancing mass. The predicted rates of non-response according to clinicopathologic + MRI model, radiomics model, and integrated model were 12.9%, 4.8%, and 12.1%, respectively. Surgical specimen image (H&E staining, f) demonstrates predominant fibrous tissue and scattered inflammatory cells, with residual malignant cells discernible.

Analysis of clinical MRI findings

Pre-NAC MRI findings (including the quantity of fibroglandular tissue, background parenchymal enhancement, tumor morphology [mass vs. non-mass enhancement], as well as the shape, margin, and internal enhancement attributes of masses alongside the distribution and internal enhancement characteristics of non-mass enhancement, and multifocality) based on the 2013 breast imaging reporting and data system (BI-RADS) MRI lexicon of the American College of Radiology were extracted from the interpretation reports stored within the picture archiving and communication system. Two breast radiologists, H-j.L. (reader 1;R1) and H.S.L (reader 2;R2), with 5 and 19 years of experience respectively in breast MRI, blinded to the treatment responses to NAC, independently performed a qualitative visual assessment of intratumoral T2 hyperintensity and peritumoral edema on pretreatment MRI and generated a consensus. Intratumoral T2 hyperintensity was characterized as signal intensity surpassing that of the surrounding tissue or equivalent to water, excluding concurrent enhancement49. Peritumoral edema was identified as T2 hyperintensity comparable to water located around or along the posterior aspect of the tumor50. Both readers independently delineated a circular ROI inside the tumor on a single slice of the ADC map to extract the ADC values, centering on the visually most hypointense area within the tumor, avoiding necrotic tissue, evident cystic portions, and disruptive artifacts.

Subsequently, R1 managed the CAD processing and scrutinized the outputs. For the analysis of enhancement kinetics, percentages of early (slow [< 50%], medium [50–100%], and rapid [> 100%]) and delayed phase (persistent, plateau, and washout) profiles were qualitatively evaluated. Persistent, plateau, and washout components corresponded to pixel intensities indicating > 10% increase, between 10% increase and 10% decrease, and > 10% decrease, respectively, in the final contrast-enhanced image relative to the initial one. Additionally, two quantitative parameters, angiovolume (representing the cumulative enhancing volume of the lesion) and peak enhancement (denoting the highest level of enhancement within the lesion), were documented.

ROI segmentation

The contrast-enhanced fat-saturated T1-weighted image acquired at 90 s after contrast media injection was selected from whole DCE-MRI sequences for radiomics feature analysis because it provides a vital information about the tumor characteristics and vascular properties. Two radiologists, H-j.L. (R1) and J.E.L. (referred to as reader 3;R3), independently drew intratumor ROIs semi-automatically using the “grow from seed” tool in a publicly available software, 3D-slicer (http://www.slicer.org). This software allows users to initiate the segmentation process by selecting a seed point within the tumor, expanding the segmentation based on intensity or other chosen criteria. Data from R1 were exclusively allocated for radiomics modeling, while the features from R3 assessed interobserver reproducibility. Peritumor ROIs, extending 3-mm from the intratumor-ROI boundary, were automatically acquired via a built-in function in 3D-slicer, with combined-ROI being an amalgamation of intra- and peritumor-ROIs.

Image preprocessing and feature extraction for radiomics analysis

In order to minimize inherent differences in pixel intensities across three different MR scanners, a z-score normalization was applied to the whole image voxels in MR image51. Following z-score normalization, the absolute value of the minimum pixel intensity across the entire image was added to each pixel, ensuring all pixel values were non-negative. The z-score algorithm was implemented from scratch using in-house-written scripts in Python (http://www.python.org). A total of 107 original radiomic features complying with the image biomarker standardization initiative52 were extracted using Pyradiomics from the three ROIs (intratumor, peritumor, and combined)53. The extracted features encompassed 14 shape, 18 first-order, 24 GLCM, 14 GLDM, 16 GLRLM, 16 GLSZM, and five NGTDM features (Supplementary Table S3). The entire workflow is visually represented in Fig. 5.

Fig. 5
figure 5

Workflow scheme depicting the processes of data acquisition, segmentation, radiomics feature extraction, modeling, and response prediction.

Development of prediction models

AutoGluon, a cutting-edge automated machine learning framework known for developing advanced stacking ensembles54, was employed to build seven classifier models from different combinations of variables: (1) clinicopathologic factors, (2) clinical MRI findings, (3) clinicopathologic factors and clinical MRI findings (a representation of conventional radiologist’s interpretation in clinical settings), (4) radiomics features (intra-, peri-, and combined-ROI), and (5) clinicopathologic factors, clinical MRI findings, and radiomics features. AutoGluon integrates various base learners, such as extremely randomized trees, k-nearest neighbors, gradient boosting machines, random forests, and tabular neural networks, into a cohesive multilayer stack ensemble. It autonomously determines feature importance, offering insights into influential variables, aiding in deciphering responses to NAC, and accepting a myriad of data types like clinicopathological factors, clinical MRI findings, and radiomics. It automates essential steps such as hyperparameter tuning, feature engineering, and data preprocessing. After the prediction models were built, the performances of the models were compared.

To mitigate the risk of overfitting due to the limited size of the dataset, a five-fold cross-validation approach was adopted as a method of internal validation. In this process, the dataset was partitioned into five distinct folds. For each iteration of the validation, one fold was cyclically designated as the test set, while the remaining four folds were used as the training set.

Each feature’s significance was appraised using AutoGluon’s internal mechanism, involving the alteration of testing data by random shuffling of feature values and evaluating the consequent model performance. This methodology is congruent with the essential tenets of LIME55. Additionally, the average of the analysis through five-fold cross-validation was computed for enhanced robustness, reinforcing reliability by averaging importance scores across iterations.

Statistical analysis

Interobserver reproducibility of the ADC values and radiomics features by two readers was estimated using an intra-class correlation coefficient (ICC)56. Interobserver agreement between the radiologist regarding intratumoral T2 hyperintensity and peritumoral edema was analyzed using kappa statistics57. Clinicopathologic factors and clinical MRI findings were compared based on treatment response using the Mann–Whitney U, Fisher’s exact, and Chi-squared tests. Receiver operating characteristic (ROC) curves were generated for each of the five folds in the cross-validation scheme to assess the performance of the prediction models. A uniform range of false positive rates (FPRs), incrementing from 0 to 1 at intervals of 0.01, served as the foundation for aggregating the individual ROC curves. True positive rates (TPRs) from each fold were interpolated onto these common FPR values, enabling a consistent comparison across folds. Subsequently, the mean TPR was computed for each standardized FPR, providing a robust aggregate curve. The corresponding standard deviations at each FPR were calculated to establish confidence intervals (CIs). Finally, the area under ROC (AUROC) was obtained for each fold, and the overall performance metric was calculated as a macro-average of these individual AUROCs58. A bootstrap resampling technique was used to statistically compare the average AUROC values obtained from five-fold cross-validation. For each model, 5000 bootstrap samples were drawn from the distribution of AUROCs for the five folds. The p-value was computed by evaluating the distribution of differences between the bootstrap samples of the two models being compared. Because of the exploratory nature of our study, we did not adjust for increased alpha errors caused by multiple comparisons.

A one-to-two nearest-neighbor matching method using propensity scores generated from logistic regression without replacement was performed to eliminate the impact of confounding clinicopathologic factors (age, NAC regimen, histologic grade, clinical T stage, clinical N stage)59. After matching, we performed a Wilcoxon signed-rank test for continuous variables and McNemar test for categorical variables. The same radiomics analysis was repeated on the selected samples. All statistical analyses were performed using R v4.1.2 (R Foundation for Statistical Computing, Vienna, Austria). All statistical tests were two-tailed with a significance level (p) set at 0.05.