Main

The addition of anti-PD-(L)1 to neoadjuvant chemotherapy has changed the treatment landscape for patients with early (stage II–III) triple-negative breast cancer (TNBC)1; however, all trials evaluating the efficacy of anti-PD-(L)1 in TNBC combined it with chemotherapy1,2,3,4. This chemotherapy backbone inevitably results in a high rate of adverse events (AEs), affects quality of life and could diminish T cell activity5,6.

So far, no biomarkers have been established to predict which patients with early-stage TNBC will benefit from anti-PD-1. Therapy is currently given for a total duration of 1 year, although data in other tumor types have shown that a pathological complete response (pCR) can be reached after only a few weeks of treatment with immune checkpoint inhibition (ICI)7,8,9,10,11. Overtreatment prevention is an increasingly important consideration due to the high number of patients needed to treat to prevent one recurrence and increasing toxicity with more intense and longer treatments. Therefore, there is an urgent clinical need to optimize treatment schedules and improve patient selection for specific treatments12.

While numerous studies have integrated anti-PD-(L)1 therapy with chemotherapy in early-stage TNBC1,2,3,13, data on combination ICIs are limited. ICIs targeting CTLA4 have revolutionized treatment for non-small cell lung cancer8 and melanoma14,15,16. Additionally, neoadjuvant trials across various tumor types have shown impressive major pathological response (MPR) rates when combining anti-PD-(L)1 with low-dose anti-CTLA4 (refs. 7,8,10,17). A trial in metastatic breast cancer revealed long-lasting responses after combining low-dose anti-CTLA4 with anti-PD-1 (ref. 18), which are infrequently observed with anti-PD-(L)1 alone. These findings provide a rationale to test low-dose anti-CTLA4 in combination with anti-PD-(L)1 in early TNBC.

Simultaneously with the advent of ICI, tumor-infiltrating lymphocytes (TILs) have emerged as a putative prognostic and predictive biomarker19,20,21,22. Patients with TNBC with high TIL levels have an excellent prognosis even without chemotherapy19,23, suggesting that TILs reflect an endogenous antitumor T cell response. Moreover, in metastatic TNBC, high TIL levels are associated with response to ICI24,25. Collectively, these findings imply that TILs may serve as a tool for identifying patients with TNBC who are more likely to benefit from ICI and have a favorable prognosis, paving the way for exploring chemotherapy de-escalation. The BELLINI trial is an adaptive platform trial exploring the effect of ICI without chemotherapy starting with window-of-opportunity (WOO) cohorts with a biological end point followed by neoadjuvant cohorts with a pCR end point. This adaptive platform trial consists of sequential, single-cohort, phase 2 studies, where new cohorts can be opened based on signals obtained in previous cohorts. The first two cohorts evaluated whether 4 weeks of nivolumab (anti-PD-1, cohort A) or nivolumab and low-dose ipilimumab (anti-PD-1 and anti-CTLA4, cohort B) can lead to immune activation (primary end point). This 4-week therapy regimen was scheduled before the start of regular therapy and therefore the effect of ICI could be assessed independently of chemotherapy. Promising results in cohorts A and B among patients with high TIL levels (≥50%) led to the initiation of cohort C. In cohort C, we used a neoadjuvant design with 6 weeks of nivolumab plus low-dose ipilimumab followed by surgery to assess the pCR rate14,26.

This trial combines anti-PD-1 with anti-CTLA4 in early breast cancer and explores what pCR rate could be achieved with ICI-only approaches and using TIL levels as an entry criterion to enrich for inflamed tumors.

Results

Design and patient characteristics

The BELLINI trial (ClinicalTrials.gov registration: NCT03815890; Fig. 1a,g and Extended Data Fig. 1a) is a preoperative, WOO, phase 2, multiple-cohort nonrandomized study in early (stage I–III) breast cancer utilizing an adaptive Simon’s two-stage design27. Here, we report the initial results from the first two WOO cohorts exploring the immune-activating capacity of short-term neoadjuvant nivolumab ± ipilimumab (cohorts A and B, n = 31) in patients with ≥5% TILs as well as the initial results of cohort C that was opened based on the results of cohorts A and B. The first patient was included on 19 September 2019 and the last patient on 24 January 2023.

Fig. 1: BELLINI trial design, efficacy data and baseline biomarkers.
figure 1

a, Trial design for cohorts A and B. Cohort A received two cycles of nivolumab (anti-PD-1). Cohort B received two cycles of nivolumab (anti-PD-1) and one cycle of ipilimumab (anti-CTLA4). Biopsies and blood were taken pretreatment and after 4 weeks of treatment after which patients proceeded to standard-of-care neoadjuvant chemotherapy (n = 28) or primary surgery (n = 3). CR, complete response; WES, whole-exome sequencing. b, Numbers of patients reaching immune activation in cohorts A (n = 15) and B (n = 15). c,d, Changes in tumor size according to the MRI for cohort A (c) and cohort B (d). The dashed line at −30% indicates radiological PR. The green bars indicate clinical responses (radiological PR and/or pathological response). Asterisks (*) represent patients with resection after ICI only (n = 3). pPR, pathological PR according to EUSOMA; SLD, sum of length diameters. e, TILs in pretreatment biopsies of patients with and without clinical response in cohorts A and B. n = 31 patients. f, Combined positive PD-L1 score (CPS) in pretreatment biopsies of patients with and without clinical response in cohorts A and B. n = 31 patients. g, BELLINI trial design for cohort C. Cohort C (n = 15) received two cycles of nivolumab and ipilimumab on days 1 and 21. Biopsies and blood were taken pretreatment and after 6 weeks. Patients proceeded to primary surgery (n = 15). h, pCR and MPR (<10% viable tumor left) rates in cohort C. NR, nonresponse. i, Changes in tumor size according to the MRI in cohort C. The dashed line at −30% indicates radiological PR. Dark blue bars show pCR. j, TILs in pretreatment biopsies of patients according to pCR status in cohort C. n = 15 patients. k, CPS in pretreatment biopsies for patients according to pCR status in cohort C. n = 15 patients. Panels a,g were created with BioRender.com. Levels of TILs calculated as average from TIL levels at diagnostic and pretreatment study (e,j). Boxplots display minimum (Q0), maximum (Q4), median (Q2) and IQR (e,f,j,k). P values were derived using a two-sided Mann–Whitney test.

Cohort A (n = 15) received two cycles of nivolumab (240 mg) on days 1 and 15. Cohort B (n = 15) received two cycles of nivolumab (240 mg) on days 1 and 15, plus one cycle of ipilimumab (1 mg kg−1) on day 1. To exclude patients with a poor prognosis, less likely to respond to ICI and not suitable for chemotherapy de-escalation, we enrolled patients with ≥5% TILs in cohorts A and B. Baseline characteristics were similar between cohorts A and B, except for a higher proportion of patients with positive lymph nodes in cohort B (Table 1).

Table 1 Baseline patient characteristics

The primary end point for cohorts A and B was immune activation, defined as at least a twofold increase in CD8+ T cells (measured by immunohistochemistry (IHC); Extended Data Fig. 1b–f) and/or increased interferon-γ (IFNG) gene expression. This end point was based on the observation that increases in intratumoral CD8+ T cells25,28 and higher IFNG signature scores17,29 in serially biopsied tumors are correlated with responses to anti-PD-(L)1.

Clinical response (secondary end point) in cohorts A and B was defined as partial response (PR)/complete response on magnetic resonance imaging (MRI) (RECIST v.1.1) or no viable tumor in post-treatment biopsy for patients proceeding to neoadjuvant chemotherapy. For patients directly proceeding to surgery, this was defined as PR or pCR (European Society of Mastology; EUSOMA). Other secondary end points included safety and translational analyses. MRI scans and biopsies were collected at baseline and after two ICI cycles.

Efficacy of short-term nivolumab and nivolumab + ipilimumab in early TNBC (WOO)

Immune activation was achieved in eight tumors (53.3%) in the nivolumab cohort (A) and nine (60%) in the nivolumab + ipilimumab cohort (B) (Fig. 1b). Therefore, both cohorts met the Simon’s two-stage27 threshold for expansion to stage II. After 4 weeks, patients proceeded to standard neoadjuvant chemotherapy followed by surgery (n = 28) or surgery without neoadjuvant chemotherapy (n = 3). Clinical response was observed in 12 of 31 patients (38.7%, 95% CI 23.7–56.2%) with 7 of 31 patients (22.6%, 95% CI 11.4–39.8%) having a PR according to RECIST v.1.1 criteria30 (Fig. 1c,d). Ten of 31 patients had no viable tumor in the biopsy and in the three patients who underwent surgery directly after ICI, two PRs and one pCR was seen. Despite these clear pathological responses, MRI showed modest downsizing, indicating MRI underestimates early ICI response (Extended Data Fig. 1h), consistent with findings in early-stage melanoma31, colorectal and gastroesophageal cancers17,32. Notably, clinical response was only observed for patients with TILs ≥ 30% (Fig. 1e) and a combined positive score (CPS) PD-L1 ≥ 20% (Fig. 1f). Patients with lower pretreatment CD8+ T cell levels were more likely to achieve immune activation (Extended Data Fig. 1g), likely due to either less possibility for value doubling or to a very early immune response in highly inflamed tumors.

Short-term neoadjuvant nivolumab + ipilimumab can induce pathological responses in patients with high TIL levels

Both cohorts A and B met the predefined thresholds of the Simon’s two-stage design27, allowing expansion to stage II; however, given the promising clinical responses observed in cohorts A and B and the approval of neoadjuvant pembrolizumab plus chemotherapy4, the study team decided not to proceed to stage II with the WOO design but to open cohort C with a true neoadjuvant design (n = 15; Fig. 1g). As all patients with a clinical response in cohorts A and B had high TIL levels, cohort C was opened for patients with ≥50% TILs and allowed only patients with node-negative disease, as for this patient population chemotherapy de-escalation could be an option in the future. The treatment schedule with combination ICI for cohort C was based on our data obtained in cohorts A and B as well as on the well-established, effective and tolerable combination ICI schedule in melanoma14,26.

Patients in cohort C underwent a 6-week treatment regimen of nivolumab and ipilimumab (administered on days 1 and 21), followed by surgery (Fig. 1g). Five patients had a pCR (33.3%, 95% CI 15.2–58.3%; Fig. 1h) with confirmed tumor-negative lymph nodes (ypT0N0). Less than 10% viable tumor remaining was seen in 3 of 15 patients (20%, 95% CI 4–48%; Fig. 1h), resulting in a total MPR rate of 8/15 patients (53%, 95% CI 27–79%). Notably, of the five patients with a pCR only one had a complete radiological response (Fig. 1i). Because of high TILs, N0 status and pCR, which are all very favorable prognostic features, all five patients with a pCR were offered the option of omitting adjuvant chemotherapy and all chose not to undergo adjuvant chemotherapy (shared decision). Patients without pCR were advised to have adjuvant chemotherapy.

Safety data and follow-up

Toxicity data are summarized in Table 2 (all events required steroids or persisted) and detailed in Extended Data Table 1. Neither neoadjuvant nivolumab nor nivolumab + ipilimumab resulted in previously unreported toxicities. All patients were monitored for (immune-related; IR) toxicities until 1 year after ICI therapy. Treatment-related AEs of any grade occurred in 41 of 46 patients (89%). A total of eight (17%) patients developed grade ≥ 3 treatment-related AEs, of which six were treated in cohort C. Except for the endocrinopathies, all AEs resolved. Notably, 19 of 46 patients (41%) developed treatment-related hypothyroidism. All patients with hypothyroidism remain dependent on replacement therapy. Six patients (13%) developed adrenal insufficiency and require ongoing corticoid replacement therapy. One patient developed a diabetic ketoacidosis and remains dependent on insulin.

Table 2 Summary of adverse events

All patients proceeded with tumor resection or neoadjuvant chemotherapy as scheduled. A total of 44 patients received both ICI doses and 2 patients only received one dose due to suspected immunotoxicity.

With a median follow-up duration of 32.5 months in cohorts A and B (interquartile range (IQR) 28.1–40.3 months), one patient in cohort A (cT2N0; intermediate TILs) developed a second primary tumor and one patient in cohort B (cT2N1; intermediate TILs) died from metastatic TNBC despite receiving standard-of-care (neo)adjuvant chemotherapy. The median follow-up for cohort C was 17.6 months (IQR 18.8–22.1 months). One patient (no response to ICI) refused adjuvant chemotherapy and radiotherapy and developed recurrent TNBC (pT1cNx, 80% TILs).

Pretreatment composition of the tumor microenvironment is associated with ICI response

Due to limited sample size, we compared clinical responders versus nonresponders from both cohorts (A and B) combined and not for the cohorts separately. Clinical responders in cohorts A and B had higher pretreatment TILs (P = 0.0014; Fig. 1e) and PD-L1 scores (P = 8.6 × 10−5; Fig. 1f) compared to nonresponders. CD8+ T cell density was not associated with clinical response (Fig. 2a and Extended Data Fig. 1b–f). Spatial analysis showed that responders had shorter distances from tumor cells to the nearest CD8+ T cells (P = 0.00001; Fig. 2b). Responders also exhibited a larger density of double-positive CD8+PD-1+ cells (P = 0.02; Extended Data Fig. 2a) and PD-1+ cells (P = 0.001, IHC; Extended Data Fig. 2b) before treatment.

Fig. 2: Pretreatment immune activation associated with clinical response.
figure 2

a, CD8+ density (IHC) in pretreatment biopsies of patients with and without clinical response in cohorts A and B. n = 31 patients. b, Median distances (µm) from tumor cells to the nearest CD8+ T cells in pretreatment biopsies of patients with and without clinical response in cohorts A and B. n = 31 patients. c, IFNG gene expression scores in pretreatment biopsies of patients with and without clinical response in cohorts A and B. n = 28 patients. d, CD8+ density (IHC) in pretreatment biopsies of patients with and without pCR in cohort C. n = 14 patients. e, Median distances from tumor cells to the nearest CD8+ T cells in pretreatment biopsies of patients with and without pCR in cohort C. n = 14 patients. f, IFNG gene expression scores in pretreatment biopsies of patients with and without pCR in cohort C. n = 14 patients. g,h, Gene set enrichment expression scores in pretreatment biopsies of patients with and without clinical response in cohorts A and B (n = 28 patients (g)) or pCR (n = 14 patients (h)) in cohort C. Heatmaps include Expanded immune signature56, Immunogenic cell death signature57, Hallmark IFNA response gene set, Hallmark inflammatory response gene set, cGAS–STING pathway gene set58, Effector CD8+ T cell gene set59, Exhausted T cell gene set59, Checkpoint molecules gene set59, Naive T cell gene set60, Tertiary lymphoid structures gene set61, Hallmark TGF-β signaling gene set, Hallmark Notch signaling. Asterisks represent the P values. *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001. Reported P values were significant after Benjamini–Hochberg (FDR) correction at 10% significance level. Boxplots display minimum (Q0), maximum (Q4), median (Q2) and IQR (af). P values were derived using a two-sided Mann–Whitney test.

In cohort C, TILs were not different between responders and nonresponders, probably due to the more homogeneous patient population with only patients with ≥50% TILs (Fig. 1j). In line with this, patients with pCR had similar PD-L1 scores, CD8+ T cell density (cells per µm2) and distances from tumor to nearest CD8+ T cells as patients without pCR (Figs. 1k and 2d,e).

We found no association between tumor mutational burden and clinical response (Extended Data Fig. 2c,d). There were no statistically significant differences between clinical responders and nonresponders in TNBC subtypes33 (Extended Data Fig. 2e).

Tumors of clinical responders harbor pre-existing inflammatory profiles and tumor-specific CD8+ T cells

We conducted in-depth analyses between clinical responders and nonresponders using bulk RNA-seq (all cohorts) and single-cell RNA sequencing (scRNA-seq) and TCR sequencing (cohorts A and B) pre- and post-treatment. Bulk RNA-seq revealed higher pretreatment levels of IFNG gene expression (P = 0.0003; Fig. 2c) and inflammatory gene signatures in clinical responders (P < 0.05 for all, false discovery rate (FDR) 10%; Fig. 2g and Extended Data Fig. 3a–e). Clinical responders also exhibited higher gene signatures associated with immune infiltration (P < 0.05 for all, FDR 10%; Fig. 2g and Extended Data Fig. 3f–j). Conversely, clinical nonresponders displayed upregulation of TGF-β and Notch signaling (P < 0.05 for both, FDR 10%; Fig. 2d and Extended Data Fig. 3l–m). Though TIL levels and distances from tumor cells to CD8+ T cells were not different in responders versus nonresponders in cohort C that included TIL high patients only, patients with pCR had significantly higher pretreatment IFNG gene expression (Fig. 2f) and higher scores of gene signatures related to immune response and T cell infiltration (Fig. 2h), consistent with our previous observations of a more inflammatory profile of the tumor microenvironment in clinical responders in cohorts A and B.

After scRNA-seq data preprocessing, we obtained 80,000 high-quality T cells from 52 samples (29 patients). Following unsupervised clustering of the T cells, we identified various subpopulations (Fig. 3a–d and Extended Data Fig. 4a–t), including CD8+ effector T cells, CD8+ tissue resident memory (CD8+ TRM) T cells, proliferating CD8+ T cells, naive CD4+ T cells, follicular helper T (TFH) cells, memory CD4+ T cells, regulatory T (Treg) cells, CD56bright and CD56dim natural killer cells. Notably, we identified a cluster of CD8+ T cells with features of tumor-specific T cells. This cluster was characterized by the highest clonality and highest expression of tumor recognition signatures derived using functional tumor recognition experiments34,35 (Fig. 3c,d). This CD8+ tumor-specific cluster was marked by high expression of tumor-reactive markers (CD39, CD103 and PDCD1), IFNG, effector molecules (GZMB, NKG7, PRF1 and GNLY), chemokines (CCL5, CCL4, CXCL13 and CCL3) and exhaustion markers (LAG3, HAVCR2, TIGIT, TOX and CTLA4; Fig. 3c,d). Clinical responders exhibited higher fractions of pretreatment CD8+ tumor-specific T cells (Fig. 3e). Clinical responders also had higher fractions of CD4+ TFH cells (Fig. 3f). The presence of tumor-specific CD8+ T cells and TFH in pretreatment biopsies was correlated with tumor decrease on MRI, indicating a continuous association between the abundance of these cells before treatment and the depth of the tumor response (Extended Data Fig. 4u,v). Patients with different TIL levels had similar T cell subtypes before treatment (Extended Data Fig. 4w).

Fig. 3: Pretreatment T cell profiles of the tumor microenvironment and peripheral blood associated with clinical response in cohorts A and B.
figure 3

a, UMAP representation of the T cell clusters in the scRNA-seq dataset (cohorts A and B). n = 52 samples from 29 patients, 80, 000 cells. NK, natural killer. b, Fractions of different T cell populations relative to all T cells in the pretreatment biopsies from clinical responders (left) and nonresponders (right) in cohorts A and B. c, Dotplot illustrating markers of different T cell clusters based on scRNA-seq data (cohorts A and B). d, Dotplot illustrating differences in tumor reactivity markers in different T cell clusters based on scRNA-seq data (cohorts A and B). Wu_signature, CD8+ T cell tumor specificity signature34; CD4_NeoTCR, CD4+ T cell tumor specificity signature35; CD8_ NeoTCR, CD8+ T cell tumor specificity signature35. e, Tumor-specific CD8+ T cell fractions relative to all T cells in pretreatment biopsies of patients with and without clinical response (cohorts A and B). n = 25 patients. f, TFH fractions relative to all T cells in pretreatment biopsies of patients with and without clinical response (cohorts A and B). n = 25 patients. g,h, Ki-67 expression on PD-1+CD8+ T cells (g) and conventional CD4+ T cells (h) pretreatment in peripheral blood of patients with and without clinical response in cohorts A and B. n = 25 patients. i, Dotplot for PDCD1 and MKI67 expression in CD4+ T cell clusters (tumoral, scRNA-seq, cohorts A and B). j, Dotplot for PDCD1 and MKI67 expression in CD8+ T cell clusters (tumoral, scRNA-seq, cohorts A and B). k, Fraction of proliferating PD-1+CD8+ T cells relative to all T cells in pretreatment biopsies of patients with and without clinical response based on scRNA-seq data (cohorts A and B). n = 25 patients. l, Fraction of Ki-67+ TFH cells relative to all T cells in pretreatment biopsies of patients with and without clinical response (cohorts A and B). n = 25 patients. Boxplots display minimum (Q0), maximum (Q4), median (Q2) and IQR (e,f,k,l). P values were derived using a two-sided Mann–Whitney test. NS, not significant.

Flow cytometry of blood samples (19 markers; Extended Data Table 2 and Extended Data Fig. 5a) revealed increased Ki-67+ cells within the PD-1+ conventional CD4+ T cell population in clinical responders (P = 0.005; Fig. 3g). A similar trend was observed for CD8+ T cells (Fig. 3h). The increased proliferation of PD-1+CD4+ T cells observed in the blood could also be traced back to the tumor, with responders having higher levels of Ki-67+ TFH, which was the CD4+ T cell cluster with the highest PDCD1 gene expression in the tumor scRNA-seq data (Fig. 3i,l). In line with the blood data, the levels of PD-1+ proliferating CD8+ T cells were not significantly different between clinical responders and nonresponders (Fig. 3j,k), suggesting a specific role for proliferating CD4+ T cells systemically as well as in the tumor microenvironment (TME).

Dynamics and post-treatment composition of the tumor microenvironment are distinct in clinical responders and nonresponders

Single-cell RNA-seq analysis revealed that although the clinical responders had higher proportions of tumor-specific CD8+ T cells before treatment, after treatment their tumors included higher levels of effector CD8+ T cells compared to nonresponders (P = 0.008; Fig. 4a,b). This suggests that effector CD8+ T cells contribute to ICI-induced tumor regression and underscore the ongoing antitumor CD8+ T cell response, even 4 weeks after treatment initiation.

Fig. 4: Effects of anti-PD-1 ± anti-CTLA4 on the T cell profiles in the tumor microenvironment after treatment in cohorts A and B: ctDNA data for all cohorts.
figure 4

a, Fractions of different T cell clusters relative to all T cells in post-treatment biopsies of patients who did (left) and did not (right) experience clinical response based on scRNA-seq data. b, Effector CD8+ T cell fractions relative to all T cells in post-treatment biopsies versus clinical response (cohorts A and B). n = 26 patients. c, Memory CD4+ T cell fractions relative to all T cells in post-treatment biopsies versus clinical response (cohorts A and B). n = 26 patients. d, Treg cell fractions relative to all T cells in post-treatment biopsies versus clinical response (cohorts A and B). n = 26 patients. e, Fractions of Treg cells relative to all T cells in post-treatment biopsies of patients (cohorts A and B) in relation to the change in tumor volume after treatment assessed using MRI (RECIST v.1.1). n = 26 patients. fh, Fold changes in fractions of T cell populations relative to all T cells in cohort A and cohort B. n = 22 patients. TFH cells (f). Naive CD4+ T cells (g). Treg cells (h). i, Changes in ctDNA levels of responding and nonresponding patients upon treatment. Patients from all cohorts (A, B and С) for whom ctDNA analysis was performed and ctDNA was detected at baseline (n = 32) were included. j, Waterfall plot of all patients for whom ctDNA analysis was performed (n = 43, all cohorts) colored according to the fold change in ctDNA levels in blood upon treatment. The groups represent ctDNA clearance; post-therapy decrease in ctDNA levels of 50% or more; no ctDNA at baseline; and no decrease in ctDNA. The dashed line at −30% indicates radiological PR. k, Barplots summarizing the number of patients for each ctDNA response category in each cohort (A, B and C). ctDNA at baseline was available for 43 of 46 patients. Boxplots display minimum (Q0), maximum (Q4), median (Q2) and IQR (bd,fi). P values (bd,fh) were derived using a two-sided Mann–Whitney test. P values in i were derived using a paired Wilcoxon test.

Conversely, nonresponders had elevated memory CD4+ T cells (P = 0.05; Fig. 4a,c) and Treg cells (P = 0.02; Fig. 4a,d) post-treatment, potentially suggesting the involvement of Treg cells in mediating resistance to ICI, consistent with previous studies36. Notably, we observed an association between the fraction of Treg cells after treatment and the lack of response or in some patients even increase in tumor volume on MRI (Fig. 4e). This correlation was specifically mediated by activated (CD137+) Treg cells, rather than nonactivated Treg cells (Extended Data Fig. 5b,c).

We also investigated whether the addition of anti-CTLA4 led to differential alterations in the TME compared to nivolumab monotherapy, although the study was not powered for cohort comparisons. Patients receiving nivolumab plus ipilimumab showed a reduced fold change in TFH cells (P = 0.02; Fig. 4f), but an increased fold change in naive CD4+ T cells (P = 0.03; Fig. 4g). Additionally, the combination ICI resulted in a decreased fold change in Treg cells (P = 0.01; Fig. 4h) compared to monotherapy, including both activated and non-activated Treg cells (Extended Data Fig. 5d,e).

ctDNA dynamics during early response to ICI

To assess the impact of short-term ICI on circulating tumor DNA (ctDNA), we conducted ctDNA analysis pretreatment and after 4 weeks (cohorts A and B) or 6 weeks (cohort C) of ICI using a tumor-informed ctDNA assay (Signatera). Despite the early tumor stages included (mostly I–II), pretreatment ctDNA was detected in 32 of 43 (74%) patients. After treatment, nine (21%) patients had complete ctDNA clearance, while an additional seven patients had a reduction of ≥50% in ctDNA load (mean tumor molecules (MTM) per ml; Fig. 4i,j). All clinical responders in cohorts A and B and patients with pCR/MPR (n = 8) in cohort C demonstrated at least a 50% drop in ctDNA or were negative for ctDNA at baseline (Fig. 4i–k).

Discussion

In this study, we demonstrate that neoadjuvant nivolumab, with or without ipilimumab, is a feasible chemotherapy-free regimen for patients with early-stage TNBC. We show that nivolumab ± ipilimumab induces immune activation in the majority of patients and can result in pCR and ctDNA clearance. Pre-existing inflammatory features such as higher TILs, shorter distances from CD8+ T cells to the tumor and higher baseline fractions of tumor-specific CD8+ T cells were associated with response. In contrast, higher fractions of Treg cells post-treatment were associated with lack of response. While standard chemo-immunotherapy for TNBC with four chemotherapy agents plus anti-PD-1 is a 5-month treatment regimen leading to a 63% pCR rate, our work suggests that with only 6 weeks of anti-PD-1 plus low-dose anti-CTLA4, a 33% pCR rate may be obtained in TNBC with high TILs. This suggests that for some patients a short-term immunotherapy-first approach may be an option if confirmed by future research in larger cohorts with a more robust follow-up; however, a substantial group of patients still needs chemotherapy and/or longer treatment to obtain a pCR. Although we did not observe any unexpected toxicity, the rate of persisting endocrinopathies, in particular hypothyroidism, was high compared to reports in other tumor types or in breast cancer when anti-PD-(L)1 is added to neoadjuvant chemotherapy. Although the 33% pCR rate would allow expansion of cohort С to stage II, with 40% grade 3–4 toxicity, 40% hypothyroidism and 20% adrenal gland insufficiencies, substantial toxicity is a serious concern, especially considering the relatively good prognosis of patients with TNBC with high TILs.

The BELLINI trial has investigated the feasibility and potential efficacy of ICI without concurrent chemotherapy in early-stage TNBC. The scoring of TILs is used as an inclusion criterion to select patients with a good prognosis for whom development of de-escalated treatment regimens is most promising. Larger clinical trials also using TILs according to this workflow when including patients have recently started (NCT05929768). In addition, the ETNA trial (NCT06078384) will explore whether patients with stage I TNBC with high TILs can forgo (neo)adjuvant chemotherapy or be treated with immunotherapy alone. The larger international OPTImaL patient preference study (NCT06476119) will also allow the option of no chemotherapy for this patient population. In addition, other studies use TILs as inclusion criteria for immunotherapy-first approaches: Pop-Durva (NCT05215106) and pan-cancer NEOASIS trial (NCT06279130). Further studies that are sufficiently powered to assess long-term outcomes are needed on the use of TILs or other immune-based biomarkers as entry criteria for immunotherapy or de-escalation studies, especially as patients with lower stage TNBC and high TILs can have an excellent outcome with local treatment alone19,37.

Immune-related endocrine disorders were the most common AEs observed. Specifically, 41% of the patients developed hypothyroidism, which, though usually easy to manage, is a permanent condition and 13% developed adrenal insufficiency, a serious long-term toxicity. Comparable neoadjuvant ICI-only studies with nivolumab + low-dose ipilimumab in head and neck squamous carcinoma, colorectal cancer, urothelial carcinoma and melanoma reported hypothyroidism in 4–8% of patients9,10,11,14 and adrenal insufficiency in 0–8% of patients9,10,11,14; however, the recent largest phase 3 trial (stage III melanoma, n = 423) reports substantial higher rates of endocrinopathies with 23.6% hypothyroidism and 9.9% adrenal gland insufficiency26. Notably, for cancer types with poor prognosis such as stage III melanoma, high toxicity rates might be acceptable, whereas this is different for patient populations with more favorable outcomes. The higher rates of hypothyroidism and adrenal insufficiencies in BELLINI compared to these studies could stem from different patient demographics. Patients with TNBC are typically female and relatively young, potentially contributing to different systemic immunity and AE incidence38. In BELLINI, we reported all immune-mediated AEs during the first year of follow-up, with 4 of 6 patients developing adrenal insufficiency >100 days after inclusion. Trials with shorter reporting periods may miss these late events, leading to underreported delayed toxicity, especially in centers not specialized in evaluating ICI regimens. When focusing on patients with similar demographics and disease, we still observe a higher rate of endocrine AEs in BELLINI compared to neoadjuvant trials for TNBC evaluating ICI plus chemotherapy. The KEYNOTE-522 trial reported thyroid dysfunction in 22% of patients treated with anti-PD-1 plus chemotherapy1. Adrenal insufficiency/hypophysitis was reported for 4.5% of patients in the KEYNOTE-522 study. A recent study with an oncolytic virus without chemotherapy found that 3 of 6 patients with breast cancer developed hypothyroidism39, which is more in line with our observations. The lower hypothyroidism rate in the KEYNOTE-522 compared to the oncolytic virus study39 and BELLINI could suggest that chemotherapy results in partial blunting of the immune response. Last, the preselection of patients with higher TILs in BELLINI may have resulted in patients who are more likely to develop IR AEs due to different systemic immunity. We also cannot rule out the influence of chemotherapy given after ICI, where steroids are used as antiemetics. Our cohort sizes are too small to compare toxicities induced by 4-week nivolumab versus 4-week nivolumab + ipilimumab versus 6-week nivolumab + ipilimumab; however, in the latter group, we observed more non-endocrinopathies such as colitis, hepatitis and pneumonitis, whereas endocrinopathies were already remarkably high with nivolumab monotherapy. This potentially signifies that neoadjuvant ICI without chemotherapy could result in a higher rate of hypothyroidism in patients with breast cancers. Of note, it was demonstrated that immunotherapy-related thyroid dysfunction and other IR AEs are associated with improved survival in multiple cancer types40,41,42,43. Nevertheless, upfront prediction of risk of immunotherapy-related toxicity for individual patients is a large unmet clinical need and the burden of AEs should be evaluated in light of the prognosis of each patient44.

The advantage of WOO studies such as BELLINI is the opportunity to evaluate promising drugs and drug combinations in an efficient manner and to analyze pre- and post-treatment tumor material that can provide insights into the therapy effects. Our primary end point of immune activation, defined as a doubling of CD8+ T cells and/or IFNG expression, was reached in 17 of 30 patients (57%). Although both cohorts reached the >30% immune activation rate, allowing cohort expansion, we observed more doubling of CD8+ T cells in patients with low pretreatment levels of these features. This could be due to the biopsy timing with deep responses at 4 weeks in tumors with high endogenous CD8+ T cells and/or a ‘saturation’ of CD8+ T cells in patients with high pretreatment values. In contrast to CD8+ T cells, IFNG counts may double even with high pretreatment values; however, they could also be impacted by decreased antigen availability in case of tumor regression. This suggests that different biomarker approaches could apply to inflamed and noninflamed tumors. Recent insights from the developments of personalized neoadjuvant immunotherapy in melanoma indicate that patients with high pre-existing IFNG levels or an increase in IFNG signature upon treatment were most likely to benefit45. The disadvantage of WOO designs with short scheduled treatments is the nonguaranteed benefit for participating patients. Also, information on established end points such as pCR rate is needed before a new treatment approach will be tested in larger trials. For this reason, the adaptive BELLINI trial allowed the opening of new cohorts with established end points to bring therapies to the next step. Although allowed by the protocol and statistical analysis plan, reporting only stage I data of a Simon’s two-stage design comes with the risk of false-positive findings. Similarly to cohorts A and B, cohort C also reached the threshold of sufficient responders to expand into stage II; however, given the relatively high rate of endocrinopathies, which are chronic, cohort C was not expanded to stage II. In this view, testing new anti-CTLA4-targeting antibodies, such as botensilimab46, intentionally designed to overcome the limitations of conventional ICI such as persisting endocrinopathies could be interesting for patients with breast cancer.

When analyzing pretreatment tumor characteristics in high-TIL tumors only (cohort C), we found that the inflammatory phenotype and markers were still discriminative between responders and nonresponders and remarkably similar to the clinical responders and nonresponders in cohorts A and B. In cohort C, pCRs had higher inflammatory gene expression profiles pretreatment, including signatures for IFNG response, checkpoint molecules, exhausted CD8+ T cells and immunogenic cell death. This suggests that, even in patients with high TILs, the profiling of baseline inflammatory status may facilitate early identification of (non)responders and should be considered in addition to TILs.

The recent publication of the tumor-specific T cell signatures34,35 enabled us to identify and follow tumor-specific CD8+ T cells in a clinical trial setting. Notably, using these signatures as a proxy for the tumor reactivity, we demonstrate that the presence of tumor-specific CD8+ T cells pretreatment is linked to ICI response.

Additionally, we observed decreased fractions of Treg cells in clinical responders compared to nonresponders after treatment, in line with previous reports on the role of Treg cells in resistance to ICI47. In a resistant mouse tumor model, anti-PD-L1 therapy led to Treg cell activation and Treg cells were shown to be activated in the single-cell data of patients with non-small cell lung cancer and basal cell carcinoma not responding to anti-PD-(L)1 ICI36. In this recent study, ICI treatment induced higher expression of genes involved in Treg cell-mediated immune suppression (PDCD1, CTLA4 and CD38) and cell cycle (MKI67) in Treg cells from the tumors of nonresponders36. Together, these findings demonstrate that Treg cells might play a critical role in resistance to ICI.

To date, data on combining anti-PD-(L)1 with low-dose anti-CTLA4 were lacking in early-stage breast cancer. Due to the noncomparative design and the small sample size, our data on the potential additive effect of ipilimumab should be considered exploratory. At the single-cell level, the addition of ipilimumab resulted in a lower fold change in Treg cells in the TME upon treatment. We also observed a correlation between higher levels of activated Treg cells post-treatment and the lack of response or in some cases even slight increase in tumor volume on MRI. This suggests that activated Treg cells play a role in resistance to immune checkpoint blockade and that depleting activated Treg cells could be a promising strategy for patients with TNBC who are unresponsive to anti-PD-1-based treatments. Of note, we cannot exclude that the lack of response or the increase of tumor volume observed by imaging was in part due to pseudoprogression. A growing body of literature analyzing anti-CTLA4 using in vivo models indicates that anti-CTLA4 can deplete Treg cells48; however, whether anti-CTLA4 can deplete Treg cells in human tumors remains a matter of debate49. A recent study by van der Leun et al. in head and neck squamous cell carcinoma also demonstrated an increase in transitional CD8+ T cells and a decrease in CD137+ Treg cells in responders after treatment with anti-PD-1 and anti-CTLA4 therapy50, indicating that this might be a consistent pattern across multiple tumor types.

After the results of the landmark trials in early-stage TNBC that added PD-1 blockade to standard neoadjuvant chemotherapy1,3,51,52, our current data provide a rationale to further explore the following observations. First, we observed complete and near-complete pathological responses after only 6 weeks of treatment with ICI in patients with high TILs. This suggests that a subgroup of TNBC could be treated with chemo-free regimens if further research powered for long-term outcome analysis will confirm our results. More research is needed on the optimal selection strategy and treatment regimen, especially in view of the observed high endocrinopathy rate. It is tempting to speculate whether extending the 6-week treatment period could result in higher pCR rates and thereby reach responses similar to outcomes obtained with chemo + ICI. This can only be achieved if the accompanying toxicity does not increase; however, it remains unknown whether pCR after immunotherapy has the same prognostic value as pCR after chemotherapy. Therefore, larger trials are needed to validate the pCR rate after short-term ICI alone and to determine whether this results in excellent survival rates, as seen in other cancers11,53. Moreover, pCR might not be the optimal end point as KEYNOTE-522 and GeparNUEVO have indicated that the benefit of PD-1 blockade is not exclusively seen in patients with pCR51,54. Second, our exploratory clinical and translational data suggest that combination ICI is feasible and could potentially enhance the effects of PD-1 blockade; however, the benefit–risk ratio of such combinations should always be carefully monitored. Third, establishing the feasibility of patient inclusion based on TIL opens the door for more immune biomarker-driven trials, which is particularly important in diseases such as TNBC that include both inflamed and non-inflamed tumors. The potential integration of additional inflammation analyses, for example, using IFNG gene expression as well as TILs as suggested by our data, may optimize patient selection, increase pCR rates for ICI-only approaches and could help treatment personalization in the future. Last, a substantial fraction of patients achieved ctDNA clearance after short-term ICI. Given the strong prognostic value of early ctDNA decrease, as shown by the I-SPY trial55, future studies are needed to investigate the feasibility and reliability of TIL-informed patient inclusion and the potential of ctDNA-informed therapy adjustments.

Methods

Patients

Patients in cohorts A and B were eligible for enrollment if they were at least 18 years of age and had stage I–III (clinical tumor stage T1c-3 and nodal stage N0–3, according to the primary tumor regional lymph node staging criteria of the American Joint Committee on Cancer, 7th edition) TNBC with confirmation of estrogen receptor (ER) and HER2 negativity (ER < 10% and HER2 0, 1 or 2 in the absence of amplification as determined by in situ hybridization) on a biopsy from the primary tumor in the breast; newly diagnosed, previously untreated disease; a WHO PS score62 of 0 or 1 and adequate organ functions. The TIL percentage is needed to be 5% or more. To ensure balanced enrollment based on TIL levels, each cohort included five patients with low (5–10%), five patients with intermediate (11–49%) and five patients with high (≥50%) TIL levels. Patients with concurrent ipsilateral, bilateral or multifocal primary tumors were also eligible for enrollment. For cohort C, patients had to meet the same criteria, but the nodal stage had to be N0, tumor stage T1c–T2 and TILs had to be 50% or more. The intention for cohort C was to explore the potential feasibility of chemotherapy de-escalation in patients with high TILs. As withholding adjuvant capecitabine for high-risk patients and/or escalating locoregional treatment for patients with more extensive disease was undesired, cohort C included only patients who were lymph node-negative.

Exclusion criteria included history of immunodeficiency, autoimmune disease or conditions requiring immunosuppression (>10 mg d−1 prednisone or equivalent); other immunosuppressive medications intake within 28 days of study drug administration; chronic or recurring infections; occult breast cancer; fertility preservation due to breast cancer diagnosis; active hepatitis B virus or hepatitis C virus infection; clinically overt cardiovascular disease; or previous systemic anticancer treatment.

Trial design and treatments

The BELLINI trial (Preoperative Trial for Breast Cancer With Nivolumab in Combination With Novel IO; ClinicalTrials.gov registration: NCT03815890) is a single center, nonblinded, nonrandomized, noncomparative phase 2 study designed to evaluate the feasibility and efficacy of checkpoint inhibition before regular neoadjuvant therapy or surgery in patients with primary breast cancer. Cohorts for prespecified breast cancer subgroups are opened in a sequential manner. Here we report the first three TNBC cohorts for patients who were treated with nivolumab (cohort A) or nivolumab + ipilimumab for 4 (cohort B) or 6 (cohort C) weeks. Cohort A had nivolumab monotherapy, 240 mg on day 1 (D1) and D15. Cohort B had nivolumab + ipilimumab 1 mg kg−1 on D1 and nivolumab 240 mg on D15. Cohort C had nivolumab + ipilimumab 1 mg kg−1 on D1 and D21. Regular therapy, consisting of neoadjuvant chemotherapy or primary surgery, started on D29 and onwards. Given the poor prognosis of patients with low TIL levels and the hypothesis that these women will probably not be the super-responders to ICI, patients were only eligible with TILs ≥ 5%. A threshold of 5% TILs was selected to exclude true immune-deserted tumors. Equal distribution of patients with different levels of tumor of infiltrating lymphocytes over the cohorts was ensured by inclusion of five patients with low TIL (5–10%), five patients with intermediate TIL (11–49%) and five patients with high TIL (≥50%) scores per cohort.

After cohorts A (in the protocol defined as cohort 1B) and B (in the protocol defined as cohort 2B) the protocol was amended to open cohort C (in the protocol defined as cohort 3B). Cohort C had the same inclusion criteria as cohort A and B, except that only inclusion of patients with clinically node-negative disease and with TIL levels of 50% or higher was allowed. With the amendment to open cohort C, the WOO design was changed into a true neoadjuvant design with all patients proceeding to surgery after the immunotherapy. After completing the interim analysis of cohorts A and B, an amendment was approved to use pCR as a primary end point instead of immune activation for cohort C and subsequent cohorts (see details on end points below).

Ethics statement

All patients provided written informed consent before enrollment. This investigator-initiated trial was designed by the Netherlands Cancer Institute (NKI).

The trial was conducted in accordance with the protocol, Good Clinical Practice standards and the Declaration of Helsinki. The full protocol, amendments and the informed consent form were approved by the medical ethical committee of the NKI.

End points

Cohorts A and B

The primary end point for cohorts A and B is immune activation following two cycles of neoadjuvant ICI, defined as a twofold increase in CD8+ T cells assessed via immunohistochemistry and/or an increase in IFNG gene expression. High-quality paired biopsies are necessary for the evaluability of this primary end point.

Clinical response

As a secondary end point for cohorts A and B, we evaluated the clinical response. Clinical response is defined as having a radiological and/or pathological response.

Radiological signs of response

At least a 30% decrease on MRI (PR according to RECIST v.1.1, not confirmed). The target (or index) lesion is defined as the largest enhancing lesion. In case of multifocality or multicentricity the largest mass and/or nonmass enhancement was measured in the axial–sagittal or coronal plane and defined as target/index lesion. In these cases, the total area occupied by the tumor (including all masses and nonmass enhancement) was also measured. The total tumor area was used for the RECIST measurements.

Pathological signs of response

Pathological response could be studied in biopsies from 28 patients due to the WOO design. The absence of viable tumor after 4 weeks of therapy in the post-treatment biopsy was classified as a clinical response. For patients proceeding to surgery this was defined as partial or pCR, according to the EUSOMA criteria.

Cohort C

The primary end point for cohort C is pCR, defined as no viable tumor remaining in the breast and lymph nodes (ypT0N0)63. MPR (the secondary end point) is a frequently used surrogate end point for efficacy in neoadjuvant trials evaluating immune checkpoint blockade across cancer types8,11,26. MPR was defined as ≤10% of residual viable tumor in the surgical specimen17,64,65 or no viable tumor in the breast but residual tumor cells in the lymph nodes.

All cohorts (A, B and C)

Secondary end points included feasibility, safety and radiological response. Feasibility was determined based on any treatment-related complications that led to a delay in chemotherapy or primary surgery beyond 6 weeks from the start of therapy. All patients were closely monitored for AEs for 100 days after the administration of the last study treatment, following the Common Terminology Criteria for Adverse Events (CTCAE) v.5 (ref. 66). In addition, we reported all immune-related AEs in the first year of follow-up. Radiological response was assessed according to the RECIST v.1.1 guidelines, but not confirmed.

Statistical analysis

For this exploratory, hypothesis-generating study, no formal sample size calculation was performed for efficacy because there were no data on the efficacy of neoadjuvant immunotherapy in breast cancer at the time of the design of this study. For cohorts A and B, the null hypothesis of a true immune activation in ≤30% of patients was tested against a one-sided alternative. For cohort C, design was identical with the exception of null hypothesis being pCR in ≤30% of patients tested against a one-sided alternative. For 80% power, at a one-sided significance level of 0.05, 15 patients were accrued per cohort to be evaluated in the first stage. If there were 5 or fewer responses among these 15 patients, the cohort was closed for futility. Otherwise, the cohort could be expanded with 31 additional patients, reaching a total of 46. We decided to publish after stage I, which was allowed by protocol, due to the observation that very early responses to ICI without chemotherapy are possible in TNBC, which warrants efforts to de-escalate therapy for a subset of patients, in contrast to the current therapy escalation for all patients with TNBC. The median follow-up time was obtained using a reverse Kaplan–Meier method. Analyses were performed using R67 v.4.2.1.

Pathology assessments and IHC analyses

All patients underwent baseline tumor staging, consisting of ultrasound of the breast, axilla and periclavicular region and MRI imaging of the breast. Positron emission tomography and computed tomography imaging was performed in all participants to confirm the clinical stage. Pretreatment tumor histological biopsies (four core biopsies, 14G needle) were taken for all patients and post-treatment tissue was either obtained through a biopsy (three core biopsies, 14G needle) for patients continuing neoadjuvant chemotherapy (n = 28) and the surgical specimen was used for those undergoing surgery right after the ICI study treatment (n = 3). Histopathological examination of biopsies and resection specimens was carried out by five experienced breast cancer pathologists (H.M.H., R.S., K.v.d.V., J.v.d.B. and N.K.). Resected tumors were examined in their entirety and regression of resected tumors was assessed by estimating the percentage of residual viable tumor of the macroscopically identifiable tumor bed, as identified on routine hematoxylin and eosin (H&E) staining. Formalin-fixed paraffin-embedded (FFPE) tissue sections were used for H&E staining and for immunohistochemical analysis of CD8 (C8/144B, DAKO), PD-L1 (22C3, DAKO) and PD-1 (NAT105, Roche Diagnostics). The percentage of tumor cells and TILs was assessed by pathologists trained for TIL assessment on H&E-stained slides according to the international standard from the International Immuno-Oncology Biomarker Working Group22 (see www.tilsinbreastcancer.org for all guidelines on TIL assessment in solid tumors). After a pathologist provided an initial TIL score, an ‘expert TIL score’ was generated as a consensus score from at least two out of four trained pathologists using slidescore.com for online scoring (www.slidescore.com). TIL scores for inclusion were scored on the diagnostic biopsy of the patient to allow for stratification of patients (low ≥ 5–10%, intermediate = 11–49% and high ≥ 50%).

Immunohistochemistry

IHC of the FFPE tumor samples was performed on a BenchMark Ultra autostainer (Ventana Medical Systems). The double stain was performed on a Discovery Ultra autostainer. In brief, paraffin sections were cut at 3 μm, heated at 75 °C for 28 min and deparaffinized in the instrument with EZ prep solution (Ventana Medical Systems). Heat-induced antigen retrieval was carried out using Cell Conditioning 1 (CC1, Ventana Medical Systems) for 48 min at 95 °C (PD-L1) or 64 min at 95 °C (PD-1/CD8 double). PD-L1 was detected using clone 22C3 (1:40 dilution, 1 h at room temperature, Agilent/DAKO, lot 11654144). Bound antibody was detected using the OptiView DAB Detection Kit (Ventana Medical Systems). Slides were counterstained with Hematoxylin and Bluing Reagent (Ventana Medical Systems).

For the double-staining PD-1 (Yellow) followed by CD8 (Purple), PD-1 was detected in the first sequence using clone NAT5 (Ready-to-Use, 32 min at 37 °C, Roche Diagnostics, lot 11654144). The PD-1-bound antibody was visualized using anti-mouse NP (Ventana Medical Systems, Ready-to-Use dispenser, lot K09956) for 12 min at 37 °C followed by anti-NP AP (Ventana Medical Systems, Ready-to-Use dispenser, lot J23971) for 12 min at 37 °C, followed by the Discovery Yellow detection kit (Ventana Medical Systems). In the second sequence of the double-staining procedure, CD8 was detected using clone C8/144B (1:200 dilution, 32 min at 37 °C, Agilent, lot 41527763). CD8 was visualized using anti-mouse HQ (Ventana Medical systems, Ready-to-Use dispenser, lot K20711) for 12 min at 370 °C followed by anti-HQ HRP (Ventana Medical Systems, Ready-to-Use dispenser, lot K22062) for 12 min at 37 °C, followed by the Discovery Purple Detection kit (Ventana Medical Systems). Slides were counterstained with Hematoxylin and Bluing Reagent (Ventana Medical Systems). A PANNORAMIC 1000 scanner from 3DHISTECH was used to scan the slides at a ×40 magnification.

Distance analysis between tumor and CD8+ T cells

Spatial analysis was performed on the pretreatment biopsies of all included patients. The stained slides were scanned and image analysis was performed with the HALO image analysis software from Indica Labs, v.3.4.2986.185 (cohorts A and B) and v.3.6.4134 (cohort C). Within HALO, the multiplex IHC module was used to phenotype and quantify CD8+ cells. Cell segmentation was performed by the detection of hematoxylin (detection weight of 1) and PD-1 (detection weights 0.045 for cohorts A and B; and 0.5 for cohort C) and CD8 for cohort C (detection weight of 0.5) staining, utilizing a nuclear segmentation aggressiveness of 0.045. Minimal intensity thresholds to consider a cell positive for a marker were set for hematoxylin (0), PD-1 (0.25 for cohorts A and B and 0.1 for cohort C) and CD8 (0.1) separately. Biopsies were analyzed in total, while for resection specimens the analysis was restricted to representative tumor beds as annotated by a breast cancer pathologist. The quantified levels of CD8+ and PD-1+CD8+ cells were corrected for the analyzed tissue area (cells per µm2).

Artificial intelligence tumor classifiers (Object Phenotyper, HALO AI) were developed to discriminate between tumor and nontumor cells in cohorts A and B and in cohort C. Individual cells were segmented (nuclei seg BF v.1.0.0), and the classifiers were trained by annotating single cells as tumor or nontumor. The annotations were guided by marked tumor regions on H&E-stained slides by a trained breast cancer pathologist. The classifiers were finalized with 20,000 iterations and a cross-entropy of 0.009 (cohort A and B) and >10,000 iterations and cross-entropy of 0.021 (cohort C).

Merging the results of the multiplex IHC and tumor classifier enabled the visualization of the spatial distribution of tumor and CD8+ cells (Extended Data Fig. 1b–f). Using the nearest neighborhood analysis, the average distance between the tumor and immune cells was quantified by taking the mean of the distances between every tumor cell and its nearest cell of the above-mentioned immune phenotypes in the pretreatment biopsies (Extended Data Fig. 1f). Distances from tumor cells to the nearest CD8+ T cells were taken as a measure of proximity of CD8+ T cells to the tumor.

DNA and RNA isolation

DNA and RNA were extracted from fresh-frozen, pre- and post-treatment tumor material using the AllPrep DNA/RNA kit (QIAGEN) for frozen material, following the manufacturer’s protocol, in a QIAcube (QIAGEN). Germline DNA was isolated from patient peripheral blood mononuclear cells using the DNeasy Blood & Tissue kit (QIAGEN).

Bulk RNA sequencing

Total RNA quality control

Quality and quantity of the total RNA was assessed by the 2100 BioAnalyzer using a Nano chip (Agilent). Total RNA samples having a RIN > 8 were subjected to library generation.

TruSeq stranded mRNA library generation

Strand-specific libraries were generated using the TruSeq stranded mRNA sample preparation kit (Illumina, RS-122-2101/2) according to the manufacturer's instructions (Illumina, document no. 1000000040498 v00). In brief, polyadenylated RNA from intact total RNA was purified using oligo-dT beads. Following purification, the RNA was fragmented, random primed and reverse transcribed using SuperScript II Reverse Transcriptase (Invitrogen, part no. 18064-014) with the addition of Actinomycin D. Second-strand synthesis was performed using Polymerase I and RNaseH with replacement of dTTP for dUTP. The generated cDNA fragments were 3' end adenylated and ligated to Integrated DNA Technologies (IDT) xGen UDI(10 bp)-UMI(9 bp) paired-end sequencing adaptors (Integrated DNA Technologies) and subsequently amplified by 12 cycles of PCR. The libraries were analyzed on a 2100 BioAnalyzer using a 7500 chip (Agilent), diluted and pooled equimolar into a multiplex sequencing pool.

Sequencing

The libraries were sequenced with 54 paired-end reads on a NovaSeq 6000 using S1 Reagent kit v.1.5 (100 cycles) (Illumina).

Data analysis

RNA-seq data were aligned to GRCh38 with STAR68 v.2.7.1a, with the twopassMode = ‘Basic’. FPKM were obtained with RSeQC69 v.4.0.0 FPKM_count.py and subsequently normalized to transcripts per million. Data quality was assessed with FastQC70 v.0.11.5, FastQ Screen71 v.0.14.0, the Picard CollectRnaSeqMetrics72,73 and RSeQC69 v.4.0.0 read_distribution.py and read_duplication.py and were found to be suitable for the downstream analysis. TNBCtype74 was used for the Lehmann subtype classification75. The Gseapy76 v.1.0.3 ssgsea tool with the sample_norm_method = ‘rank’ was used for gene set signature scoring. For the signature analysis, P values were significant after FDR correction (Benjamini–Hochberg) at a 10% significance level. Data were analyzed with Python77 v.3.10.5. Pandas78,79 v.2.0.0 and numpy80 v.1.22.4 were used for data handling. Matplotlib72 v.3.5.2, seaborn81 v.0.12.2 and statannotations82 v.0.5.0 were used for plotting.

Whole-exome sequencing

For each sample the amount of double-stranded DNA was quantified by using the Qubit dsDNA HS Assay kit (Invitrogen, cat. no. Q32851). A maximum amount of 2 μg double-stranded genomic DNA was fragmented by covaris AFA technology to obtain fragment sizes of 200–300 bp. Samples were purified using Agencourt AMPure XP Reagent (Beckman Coulter, cat. no. A63881) in a 2× reaction volume settings according to the manufacturer’s instructions. The fragmented DNA was quantified and qualified on a BioAnalyzer system using the DNA7500 assay kit (Agilent Technologies cat no. 5067- 1506). With a maximum input amount of 1 μg fragmented DNA, next-generation sequencing library preparation for Illumina sequencing was performed using the KAPA HTP Prep kit (KAPA Biosystems, KK8234) in combination with xGen UDI-UMI adaptors (IDT). During the library amplification step, four cycles of PCR were performed to obtain enough yield for the exome enrichment assay. All DNA libraries were quantified on a BioAnalyzer system using the DNA7500 assay kit. Exome enrichment was performed on library pools of six unique dual indexed libraries, 500 ng each, using the xGen Exome Hyb Panel v.2 (IDT, cat. no. 10005152) and xGen Hybridization Capture Core Reagents according to manufacturer’s protocol, with hybridization time adjusted to 16 h and ten cycles of PCR performed during post-capture PCR. All exome enriched library pools were quantified on a BioAnalyzer system using the DNA7500 assay kit, pooled equimolar to a final concentration of 10 nM and subjected to paired-end 100-bp sequencing on an Illumina Novaseq 6000 instrument using a NovaSeq 6000 S4 Reagent Kit v.1.5 (Illumina, 20028313), according to the manufacturer’s instructions.

Data analysis

Sequencing reads were aligned to the human reference GRCh38 (Ensemble, v.105) using BWA83 v.0.7.17. Duplicated reads were marked using Picard73 MarkDuplicates v.2.25.0, after which quality scores were recalibrated using GATK4 (ref. 84) BaseRecalibrator v.4.2.2.0. Single-nucleotide variants and short insertions and deletions (indels), were called using GATK4 (ref. 84) Mutect2 v.4.2.2.0 on the tumor samples matched with germline samples. Subsequently, variants were filtered by the PASS filter, and annotated using Ensembl Variant Effect Predictor 105. The maftools85 v.2.10.5 package was used for the analysis. Tumor mutational burden was calculated by summarizing the total number of nonsynonymous somatic mutations with a minimal variant allele frequency of 20%. Data were analyzed with Python77 v.3.10.5 and R67 v.4.1.3. Pandas78,79 v.2.0.0 was used for data handling. maftools85 v.2.10.5, Matplotlib72 v.3.5.2, seaborn81 v.0.12.2 and statannotations82 v.0.5.0 were used for plotting.

scRNA-seq and TCR sequencing

Preparation of the single-cell suspension

Following biopsy or obtaining resection specimens, samples were rapidly processed for scRNA-seq. Samples from cohort A were minced on ice and frozen in 10% dimethylsulfoxide FCS at −80 °C. Within 4 weeks after freezing, samples were defrosted in 37 °C medium. Samples from cohort B were minced on ice and immediately processed for single-cell sequencing (not frozen), which did not result in a batch effect.

Samples were transferred to a tube containing 1 ml digestion medium containing collagenase P (2 mg ml−1, Thermo Fisher Scientific) and DNase 1 (10 U µl−1, Sigma) in RPMI (Thermo Fisher Scientific). Samples were incubated for 20 min at 37 °C and were pipetted up and down every 5 min for 30 s. Next, samples were filtered on a 40-µm nylon mesh (Thermo Fisher Scientific) and directly after the same volume of ice cold PBS containing 0.04% BSA was added. Following centrifugation at 300g and 4 °C for 5 min, the supernatant was removed and discarded, and the cell pellet was resuspended in red cell blood lysis buffer for 5 min at room temperature and then centrifuged again at 300g at 4 °C for 5 min. The supernatant was removed and discarded and the pellet was resuspended in PBS containing 0.04% BSA. Next, 10 μl of this cell suspension was counted using an automated cell counter (ChemoMetec NucleoCounter NC-200) to determine the concentration of live cells. The entire procedure was usually completed within 1 h and 15 min.

scRNA-seq data acquisition and preprocessing

Libraries for scRNA-seq were generated using the Chromium Single Cell 5′ library and Gel Bead & Multiplex kit from 10x Genomics. We aimed to profile 10,000 cells per library if a sufficient number of cells was retained during dissociation. All libraries were sequenced on a HiSeq4000 or NovaSeq 6000 until sufficient saturation was reached.

Data analysis

After quality control, raw sequencing reads were aligned to the human reference genome GRCh38 and processed to a matrix representing the unique molecular identifiers’ per-cell barcode per gene using Cell Ranger (10x Genomics, v.2.0). The data were analyzed with scanpy86 v.1.9.3 and Seurat87 v.3. Cellbender88 v.0.3.0 was used for eliminating technical artifacts and cells above the quality cutoff of 0.5 were filtered out. Cells with mitochondrial RNA content >0.25, the number of genes <200 or >6,000 and <400 counts were filtered out. After normalization, regression for the number of unique molecular identifiers, percentage mtRNA, sample ID, cell cycle, hypoxia, interferon content and cell stress was performed on the 2,000 most variable genes followed by principal-component analysis. Next, a Uniform Manifold Approximation and Projection (UMAP) was generated and clustering was performed at resolution of 0.2 using the 30 most informative components. Major cell types were identified based on canonical marker genes.

For T cell subclustering, the T cells were selected from the full Seurat object and the analysis described above was repeated with ten principal components based on the elbow plot and clusters were identified at a resolution of 0.6 and were annotated based on breast cancer tissue-specific marker genes89. Cells expressing markers of other cell types (immunoglobulins and hemoglobin) were filtered out. Principal-component analysis was calculated on highly variable genes with k = 30. Clustering was performed with Phenograph90 with k = 30. Cluster identification was performed based on canonical marker genes. Signature scores were calculated with sc.tl.score_genes. Groups were compared to sc.tl.rank_genes_groups, with method = ‘wilcoxon’ and use_raw = True. EnrichR91,92 was used for the pathway enrichment analysis. Activated Treg cells were defined based on the level of CD137 gene expression >0.5 in the Treg cell population. PD-1+Ki-67+CD4+ cells were defined based on the level of MKI67 gene expression >0 in the TFH cell population. Scirpy93 v.0.11.2 was used for the TCR analysis. Clonotypes were defined based on the amino acid structure. Clonality was calculated as (1 − normalized Shannon entropy). Data were analyzed with Python77 v.3.10.5. Pandas78,79 v.2.0.0 and numpy80 v.1.22.4 were used for data handling. Matplotlib72 v.3.5.2, seaborn81 v.0.12.2, sc-toolbox94 v.0.12.3 and statannotations82 v.0.5.0 were used for plotting.

ctDNA analysis

A proprietary bioinformatics tissue variant calling pipeline was used to select a set of 16 high-ranked, patient-specific, somatic, clonal single-nucleotide variants from whole-exome sequencing. The Signatera amplicon design pipeline was used to generate multiplex PCR (mPCR) primer pairs for the given set of 16 variants. For cfDNA library preparation, up to 20,000 genome equivalents of cfDNA from each plasma sample were used. The cfDNA was end-repaired, A-tailed and ligated with custom adaptors, followed by amplification (20 cycles) and purified using Ampure XP beads (Agencourt/Beckman Coulter). A proprietary mPCR methodology was used to run patient-specific assays. Sequencing was performed on these mPCR products on an Illumina HiSeq 2500 Rapid Run (50 cycles) using the Illumina Paired End v.2 kit with an average read depth of >100,000× per amplicon. All paired-end reads were merged using Pear v.0.9.8 software and mapped to the hg19 reference genome with Novoalign v.2.3.4 (http://www.novocraft.com/). Plasma samples with at least two variants with a confidence score above a predefined algorithm threshold were defined as ctDNA-positive.

Flow cytometry of fresh blood

Flow cytometry was performed as previously described95. In brief, fresh blood samples were processed and analyzed within 24 h after blood draw. Peripheral blood was collected in EDTA vacutainers (BD) and subjected to red blood cell lysis (lysis buffer, dH2O, NH4Cl, NaHCCO3 and EDTA). Cells were suspended in PBS containing 0.5% BSA and 2 mM EDTA and counted using the NucleoCounter NC-200 (Chemometec) automated cell counter. To obtain absolute white blood cell counts per ml human blood, the total amount of post-lysis cells was divided by the volume (ml) of blood obtained from the patient. For surface antigen staining, cells were first incubated with human FcR Blocking Reagent (1:100 dilution, Miltenyi) for 15 min at 4 °C and then incubated with fluorochrome-conjugated antibodies for 30 min at 4 °C. For intracellular antigen staining, cells were fixed with Fixation/Permeabilization solution 1× (Foxp3/Transcription Factor Staining Buffer Set, eBioscience) for 30 min at 4 °C and stained with fluorochrome-conjugated antibodies in Permeabilization buffer 1× (eBioscience) for 30 min at room temperature. Viability was assessed by staining with either 7AAD staining solution (1:10 dilution; eBioscience) or Zombie Red Fixable Viability kit (1:800 dilution, BioLegend). Data acquisition was performed on an LSRII SORP flow cytometer (BD Biosciences) using Diva software and data analysis was performed using FlowJo v.10.6.2. The gating strategy is displayed in Extended Data Fig. 5a.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.