Introduction

The neoadjuvant treatment approach for breast cancer involves systemic therapy followed by a post-neoadjuvant phase consisting of surgery and/or radiotherapy and/or systemic therapy such as endocrine treatment. Neoadjuvant chemotherapy (NAC) with the tumor in situ allows tumor-response monitoring in vivo using, for instance, dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). After surgery, the actual response of the tumor is established from the excised tissue, as expressed by pathological complete response (pCR) or the residual cancer burden (RCB), which is associated with survival [1,2,3]. Unfortunately NAC often comes with side effects, some of them long-lasting [4, 5]. Breast surgery has its own side-effects such as unsatisfactory cosmetic outcome, which is in turn correlated with reduced quality of life [6, 7]. Reducing these side effects could be possible in patients with tumors sensitive to chemotherapy, where a good response might also be achieved with less intensive NAC, and surgery could perhaps be omitted or postponed. In order to safely select patients for (participation in clinical trials on) de-escalation of NAC or surgery, methods for accurate prediction of response to NAC are essential. Although many efforts have been made to develop new methods, both invasive and non-invasive, none have yet been considered adequate to incorporate in clinical practice.

In the post-neoadjuvant phase following surgery (and in most cases radiotherapy), the question on who is to benefit from additional systemic treatment also cannot be answered with full confidence yet. Efforts have been made to develop methods to predict patient survival based on the tumor and patient characteristics such as the Nottingham Prognostic index or PREDICT, and gene expression tests like Oncotype DX, MammaPrint, EndoPredict and Breast Cancer Index [8, 9]. The majority of these methods has, however, been based on patients who received breast surgery as initial treatment and their role following NAC has not been fully established yet [2, 3]. There is thus still a clinical need for further refinement of post-neoadjuvant treatment decisions.

Many known predictors for response to NAC and post-neoadjuvant prognosis relate to tumor characteristics like estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) status [2], tumor grade [10]) and TNM stage [11]. In addition, the immunogenic microenvironment of the tumor plays a role in sensitivity to treatment [12]. TILs are usually less abundant in ER+/HER2− breast cancer, and conflicting results have been reported about the relationship with response to NAC and prognosis after NAC [13]. Because the percentage of TILs is easily scored on hematoxylin and eosin (HE) slides of a diagnostic tumor biopsy, additional research to evaluate its predictive value in the ER+/HER2− groups is warranted.

To improve prediction of response, an appealing approach is to combine information from different available sources. One study has suggested the potential of TILs added to pre-treatment radiomics of DCE-MRI to better predict response to NAC [14]. Although promising, this was a relatively small study in only TN breast cancer patients. Hence, the potential of this combination needs further confirmation in TN breast cancer, whereas its value is yet to be explored in the other breast cancer subtypes.

Hence the aim of this study is to explore if the combination of TILs and DCE-MRI improves the assessment of response to NAC and if TILs enable stratification of prognosis in the post-neoadjuvant phase in the whole group and two subgroups of patients: those with tumors that were ER positive and HER2 negative (ER+/HER2−) and those with tumors that were either triple negative or HER2 positive (TN&HER2+).

Methods

Patients

Two patient cohorts were combined in this study. Cohort A consisted of patients with stage 1–3 invasive breast cancer of any subtype treated with NAC followed by surgery at the University Medical Center Utrecht between January 1st 2011 and December 1st 2019. Patients with oligometastatic disease treated with curative intent were also included. Patients were excluded if they received less than two cycles of NAC, or if biopsies nor MR images were available. All patients with an indication for adjuvant endocrine therapy were offered this treatment.

Cohort B consisted of patients with invasive breast cancer from a prospective multicenter NAC study, running from 2020 to 2022. All patients signed informed consent before enrollment. Inclusion criteria were: female patients aged 18 years or older, histologically proven invasive breast carcinoma and planned to receive NAC. Exclusion criteria were grade 1 estrogen receptor (ER)-positive and HER2-negative breast cancer, inflammatory breast cancer, distant metastases on positron emission tomography/computed tomography (PET/CT), prior ipsilateral breast cancer < 5 years ago, other active malignant diseases in the past 5 years (excluding squamous cell or basal cell carcinoma of the skin), pregnancy or lactation and contra-indications for MRI. All patients underwent NAC according to Dutch guidelines [15] and were offered adjuvant endocrine therapy if indicated. The potential of MRI features of cohort B to assess response to NAC in combination with liquid biopsies has been reported previously [16].

Pathological evaluation

The percentage of stromal TILs (%TILs) in the pre-treatment biopsies and surgical resection specimen were assessed by two experienced breast pathologists (RS, PvD), following the International TILs Working Group guideline [17]. The pathologists were blinded to the outcome and non-pathologic predictors during %TILs assessment. The residual cancer burden (RCB) [1] was assessed by an experienced breast pathologist (PvD) using the calculator provided by the MD Anderson website in both cohorts [18]. The pathologist was blinded to the %TILs and non-pathology predictors during RCB assessment. For cohort A, the tumor grade (according to the Nottingham modification of the Bloom and Richardson method [6, 7]) was extracted from the pathology report. If the grade was not available from the report, the biopsy was reassessed for grade (by PvD). Pre-treatment ER, HER2 and nodal status were extracted from the pathology reports as well. For cohort B, central revision of all biopsies was performed (by PvD). Positive nodal status was established by image guided lymph node fine needle aspiration or biopsy, or sentinel node procedure prior to NAC. PCR was defined as ypT0/isN0.

MR imaging

MRI scans were performed before start of treatment in both cohorts. For the patients from cohort A, the MRI that was performed closest to surgery was used as the end-of-treatment MRI. For the patients from cohort B, the end-of-treatment MRI scan was performed after NAC was completed.

For both the pre-treatment and end-of-treatment scans, the following five features were calculated: (1) number of lesions, defined as the number of lesions that were delineated (2) total lesion volume, in mm3 (3) mean lesion volume, defined as total lesion volume divided by the number of lesions (4) total largest diameter, defined as the largest diameter spanning all lesions (5) sum of the largest diameters, defined as the sum of the largest diameters of individual lesions. In the data pre-processing step, a variable was created for the relative difference in each of the five MRI tumor load features, by dividing the end-of-treatment feature value by the pre-treatment feature value (hereafter: change in MRI tumor load). Lesions were segmented as follows: In cohort A, semi-automated delineation of the breast lesions was performed according to previously reported method based on histopathology-validated region growing [19]. In short, seed points were placed by an experienced biomedical engineer (MHAJ) in the lesions based on clinical reporting and verified by a radiologist. From this seed point, automated constrained volume growing took place based on contrast uptake. Small manual corrections were made to remove erroneous segmentations of vessels. In cohort B, a previously validated deep learning-based approach was employed based on the nnU-Net framework [20]. Only segmentations in the breast with the biopsy proven tumor were taken into account.

The DCE-MRI protocol consisted of five post-contrast series, taken at a 60 to 90 s interval in cohort A and a minimum of three post-contrast acquisitions in cohort B. If less than five post-contrast series were available, the last available post-contrast series was used as substitute for the missing series. Only the T1-weighted dynamic contrast series were analyzed in this study. In cohort A, imaging was performed using either 1.5 T or 3 T MRI scanners from a single vendor (Philips), while in cohort B, imaging was exclusively performed on 3 T scanners from multiple vendors. Fat-suppression was applied to all scans.

Statistical analysis

The MRI variables and %TILs values were transformed into variables with normal-shaped distributions using a Box-Cox procedure. To be able to use cases with incomplete data in the model development and evaluation, we imputed missing values. For cohort A and B, if the end-of-treatment tumor load features were missing and they were available at an earlier point (from a scan during treatment), the earlier time-point feature values were used for imputation by last observation carried forward. If the features were only available at one timepoint, the sample mean was used for the change in tumor load features. The %TILs and RCB were imputed with the sample mean.

Different prediction models were developed to explore the possible additional value of predictors: a model with only %TILs, a model with only change in MRI tumor load, and a third model combining change in MRI tumor load and %TILs. The individual %TILs and change in MRI tumor load models were developed and evaluated in the whole patient population, and then evaluated in the ER+/HER2− and triple negative (TN)/HER2+ subgroups separately. Each model was a (main effects) linear regression model with RCB as the dependent variable, fit using L1-penalised maximum likelihood estimation (LASSO) with the penalty parameter set at the value that yielded the lowest mean squared error in an inner-loop tenfold cross validation scheme. To estimate the expected out-of-sample performance of the various models in terms of discrimination, we used an additional outer-loop cross validation (CV) with fivefolds and 10 repeats. Discrimination for pCR (RCB 0) vs. residual disease (RCB > 0) was evaluated using receiver operating characteristic (ROC) curves and, in particular, the area under the ROC curve (AUC). 95% confidence intervals (CI) were estimated by bootstrapping the original data and performing repeated cross-validation in each bootstrap sample.

For estimation of the median follow-up time after NAC, the reverse Kaplan Meier method was used. For exploring the association between %TILs and recurrence-free survival (RFS, as previously defined [21]), we used Cox regression analysis with the box-cox transformed %TILs as the explaining variable, calculating hazard ratios (HR). In order to create Kaplan Meier curves, the original %TILs values were stratified in two predefined groups of 1–10% and > 10% TILs, corresponding to the low vs. intermediate + high groups of a large pooled analysis [22]. We merged the intermediate and high subgroups of TILs because of the expected lower TILs counts in the ER+/HER2− group. All statistical analysis were performed in R software version 4.2.2.

Results

A total of 190 patients were included in this study (129 in cohort A, 61 in B) of which 106 were ER+/HER2−, 40 ER−/HER2 (TN)− and 44 ER+−/HER2+. All patients underwent surgery after NAC, 51 patients reached pCR. Median follow up time after NAC was 58 months. There were a total of 31 RFS events (Table 1).

Table 1 Patient and tumor characteristics of breast cancer patients treated with NAC that were included in the study, overall and according %TILs and cohort

Pearson’s correlation coefficient of %TILs in biopsy with response variables were as follows: − 0.21, − 0.08, − 0.01, − 0.06, − 0.1 and − 0.12 for RCB, the relative change in number of lesions, total volume, mean volume, total largest diameter and sum largest diameter, respectively. Illustration of MRI images and descriptives of two typical patients can be found in Supplementary Fig. 1.

There were a total of 13% missing values for the change in tumor load MRI variables (15% in cohort A and 10% in cohort B) and 15% for the %TILs values (22% in cohort A and 0% in cohort B). RCB was missing in 2 cases. See Supplementary Table 2 and 3 for missing data per cohort. Proportion of missing data was comparable among the IHC subtypes.

Assessment of response to NAC

Figure 1 depicts the ROC curves and corresponding (cross validated) AUC of each of the models. Coefficients can be found in Supplementary Table 4. A prediction model containing the change in MRI tumor load reached an estimated CV AUC of 0.69 (95% CI 0.61–0.79) in all patients, and a model with only %TILs had an estimated CV AUC of 0.69 (95% CI 0.53–0.78). A prediction model combining %TILs and change in MRI tumor load had an estimated CV AUC of 0.75(95% CI 0.67–0.83). The change in MRI tumor load model evaluated in the ER+/HER2− patient group yielded an estimated CV AUC of 0.67(95% CI 0.51–0.84), the %TILs-only model an estimated CV AUC of 0.68 (95% CI 0.50–0.82), while the combined model had an estimated CV AUC of 0.72 (95% CI 0.60–0.88). For the TN&HER2+ subgroup, the change in MRI tumor load model had an estimated CV AUC of 0.67 (95% CI 0.56–0.79), the %TILs only model an estimated CV AUC of 0.63 (95% CI 0.49–0.74), while the combined model had an estimated CV AUC of 0.70(95% CI 0.59–0.82).

Fig. 1
figure 1

ROC curves for discriminating between pCR (RCB 0) and residual disease (RCB > 0) for the different prediction models. The black line represents the mean curve over all CV loops. The dotted lines represent the 95% confidence intervals. AUC cross-validated area under the curve. CI confidence interval

Explorative analysis of TILs vs. event-free survival

%TILs was significantly associated with RFS in all patients (HR 0.72 (95% CI 0.53–0.98), for every standard deviation increase in %TILS, p = 0.038). This association does not appear substantially different in the two subgroups (HR 0.68 (95% CI 0.44–1.074), p = 0.10 in the ER+/HER2− group and HR 0.72 (95% CI 0.44- 1.19), p = 0.20 in the TN&HER2+ group). Figure 2 shows the survival curves of patients with %TILs 1–10 vs. > 10 in each of the groups. %TILs in the resection specimen was also evaluated but was not available in all patients resulting in insufficient data to properly assess the association with RFS.

Fig. 2
figure 2

Kaplan Meier curve for recurrence free survival stratified to TILs in biopsy of > 10% (red line) vs. 1–10% (blue line). A RFS for all patients by TILs in biopsy. B RFS for ER+/HER2− patients by %TILs. C RFS for TN&HER2+ patients by %TILs

Discussion

In this multicenter study, we explored the combination of %TILs and change in tumor load on DCE-MRI to assess response to NAC in patients with ER+/HER2− and TN&HER2+ breast cancer. A higher CV AUC was observed for the combination of %TILs and change in tumor load on MRI compared to either one alone in the whole group. This was also observed in the ER+/HER2− group, as well as in the TN&HER2+ group.

The difference in observed discriminative ability should, however, be interpreted with caution given the wide confidence intervals.

There is a large need for improvement of response prediction, before clinical trials on postponing or omitting surgery after NAC have a good chance of succeeding [23]. Our work suggests that %TILs and MRI may hold complementary information and could be a useful combined biomarker for response to NAC in different breast cancer subtypes.

TILs have been shown by others to be correlated to pCR in the HER2+ and TN subtypes, with higher TILs relating to higher pCR rates [12, 22]. In the ER+/HER2− subtype, the literature is inconclusive. A large pooled analysis by Denkert et al. showed a significant positive correlation between TILs and pCR in the ER+/HER2− subtype[22]. A different meta-analysis and some other smaller studies did, however, not find this correlation [12, 13, 24,25,26]. TILs are reported to be less frequent in ER+/HER2− breast cancer compared to the other subtypes [27], which makes it less likely to find a correlation in smaller groups. We found an association between TILs and response to NAC as measured by RCB in the whole group of patients. One study found significant correlations between RCB classes and TIL CD8/FOXP3 ratio in TN breast cancer [28]. A different study by Elmahs et al. did not find a correlation between TILs and RCB class, perhaps due to small sample size [29].

In both the TN subtype and, more recently, in the luminal subtype as well, the combination of NAC with immunotherapy was shown to improve pCR [30,31,32]. In both groups treated with this combination, higher sTILs were associated with higher pCR rates [33,34,35,36]. Personalizing treatment and not giving more than necessary is of special importance in the light of high costs associated with immunotherapy. Models like the one presented in this study could potentially aid clinical decision making in treatment with the combination of NAC and immunotherapy in the future. They should however be evaluated in a population treated with this regimen.

With regard to the prognostic value of TILs, we found that higher %TILs in biopsy is associated with better RFS after NAC in the whole cohort. This suggests that %TILs could also be useful for post NAC decision making, although its role in relation to other prognostic factors was not investigated here due to too few events. High TILs have been reported to be correlated to better prognosis in the TN and HER2+ subtypes [12, 22, 37]. In the ER+/HER2− group, the pooled analysis by Denkert et al. reported low TILs (0–10%) to be correlated with improved disease free survival, in contrast with our results [22]. The meta-analysis by Li et al. reported no correlation between TILs and survival [12]. Our work thus contributes to the growing body of research on the prognostic role of TILs in breast cancer. We did not have enough data to evaluate the relationship with TILs in the residual tumor to RFS, but work is underway to incorporate TILs in the residual tumor after NAC in the RCB to further stratify post-neoadjuvant prognosis [38]. Since MRI is more accurate in evaluating response to NAC compared to mammography, ultrasound and physical examination, it is widely used in clinical practice [39,40,41]. Radiological assessment alone is, however, not accurate enough to guide treatment decisions [42]. A (semi-) automated method for evaluating response to NAC could be of interest, since manual measurement by RECIST is associated with intra- and interobserver variability [43,44,45].

Tissue biopsy is always a part of the diagnostic pre-treatment work-up and assessing TILs in the biopsy is quick and easily implemented, possibly even more so when artificial intelligence algorithms are deployed [46]. The combination of TILs and computer extracted MRI features may therefore be an efficient use of information that is available from the clinical workflow without additional (invasive) procedures. Our results suggest the complementary value of these different data sources in assessing response to NAC, which could ultimately help in sparing patients unnecessary treatment.

Our study has several limitations. First, there was no independent cohort to perform external validation of the developed models. Second, due to limited sample size, we were unable to account for relevant predictors such as treatment regimen and nodal status, or to evaluate the HER2+ and TN subtypes separately. A larger sample size and longer follow up could result in more events and thus a more robust EFS analysis. Third, our two cohorts contain patients from different periods in time, which resulted in different treatment regimens that may not reflect current clinical practice. Additionally, for cohort A, not all biopsies were centrally revised for ER, HER2 and nodal status. This could result in unwanted interobserver variability in these variables, which is however a reflection of clinical practice. Lastly, the MRI processing differed between cohort A and B. In theory, this could have impacted the results, although the methods have been shown to lead to highly correlated results [20].

In conclusion, our results show that the combination of TILs and change in tumor load on MRI is informative of response after NAC overall, as well as in the ER+/HER2− and TN&HER2+ groups separately. This could be of interest for clinical trials on de-escalating surgical intervention. More work is, however, needed to reduce uncertainty and improve accuracy by modifying for other predictors as well.