Introduction

Breast cancer (BC) is the most frequently diagnosed malignancy in the world1. Recent advances in BC diagnosis and treatment modalities have enabled early diagnosis of BC and improved survival. Nevertheless, many BC patients have a dismal prognosis2,3.

Next-generation sequencing provides insight into the genetic history of cancers. For BC, the most frequently observed mutations in The Cancer Genome Atlas (TCGA) are PIK3CA and TP53 somatic mutations; other genetic studies have described alterations of driver genes of BC including MYC, CCND1, PTEN, and ERBB24,5,6.

These discoveries of genetic alterations in BC heralded the era of precision medicine and targeted therapy. Amplification of ERBB2, a traditional BC biomarker, has been treated with human epidermal growth factor receptor-2(HER-2)-targeted agents like trastuzumab, an anti-HER-2 monoclonal antibody7,8,9,10. In addition, a PIK3CA inhibitor, alpelisib, is now approved for treatment of patients with hormone receptor (HR)-positive metastatic BC harboring PIK3CA hotspot mutations11,12,13. Additionally, ESR1 mutations, a resistance mechanism of aromatase inhibitors, can be repaired with a new-generation selective estrogen receptor degrader14,15,16.

Recent pan-cancer genomic data have revealed that BC was the most commonly structural variants (SVs) harboring cancer among various cancer types17. In addition, these SVs including fusion could be resistance mechanisms of therapy and also be therapeutic targets. For example, neurotrophin tyrosine receptor kinase (NTRK) gene fusions (NTRK1, NTRK2, or NTRK3) are oncogenic drivers in various tumor types18 that can be targeted by recently developed TRK inhibitors19,20 and ESR1 fusions were resistant mechanism of endocrine therapy in HR-positive BC21. However, SVs, especially fusions in early BCs (EBCs) were rarely reported and the prognostic role of fusions was unrevealed.

In this study, we performed SV analysis using transcriptome sequencing of samples using 297 EBC samples. We evaluated fusions based on BC subtype and investigated the relationship between fusions and other genetic alterations. Lastly, we analyzed the prognostic value of fusions in EBC patients.

Results

Patients and tissue collection

We extracted RNA and performed RNAseq of tissue samples from 298 BCs enrolled in two translational studies (Fig. 1). Due to quality control failure in one sample, we evaluated RNAseq data from 297 early BCs.

Fig. 1
figure 1

Consort diagram (n = 297).

Baseline characteristics of tissue samples are described in Table 1. In 297 BCs. median age at diagnosis of BC patients was 39.9 years (interquartile range [IQR]: 35.5–49.4); 81.1% of samples were collected in tissue biopsy in neoadjuvant (NAC) setting and 18.9% surgical specimens in an adjuvant setting. All BC specimens were harvested from the breast. With regard to BC subtype, 51.2% had triple-negative BC (TNBC), 22.6% HR + HER2-, 14.1% HR + HER2 + , and 12.1% HR-HER2 + . In NAC setting, 27.4% were HR + HER2-BC, 32.4% in HER2+ regardless of HR state and 40.2% in TNBC whereas 98.2% of young breast cancer(YBC) cohort were TNBC. We further evaluated the intrinsic subtype through PAM50 analyses: luminal A type was identified in 17.5% of samples, luminal B in 10.1%, basal-like in 49.5%, HER2 enriched in 18.5%, and normal in 4.4%.

Table 1 Sample characteristics (N = 297)

Among 297 BCs, we collected follow up survival data and treatment information in 197 BC patients (Table 1). In 197 BC patients, 161 patients were treated with NAC followed by curative surgery and 52 patients were treated with surgery followed by adjuvant treatment regarding BC subtypes. In 161 BCs in NAC cohort, clinical stage at diagnosis included 19.3% of clinical stage IIIC and 57.4% of patients were diagnosed as TNBC (Table 1). Details of BC subtypes and intrinsic subtypes were described in Supplementary Tables 2 and 3.

Fusions according to BC characteristics

We analyzed the RNAseq of 297 tissue samples using three fusion detection software programs (Supplementary Table 4). First, we included fusions with more than three supporting reads (Fig. 1). We excluded fusions identified by only one program, artifacts, and fusions found in normal tissue.

We found a median of five to eight fusions (Supplementary Table 4). Among the three callers, STAR.Arriba detected the most fusions (median number of fusions: 8, IQR: 4–14), whereas STAR.Fusion (median: 7, IQR: 4–13) and STAR.SEQR (median: 5, IQR: 3–9) found the fewest. The median number of detected fusions per BC sample after filtering was 5 (IQR: 3–9) (Supplementary Table 5).

We also evaluated fusions according to BC subtype (Fig. 2a). HR + HER2- BC had the fewest fusions (median: 7, IQR: 3–15) compared to TNBC (median: 9, IQR: 4.75–14), HR-HER2 + BC (median: 9.5, IQR: 5.75–18.25), and HR + HER2 + BC (median: 10, IQR: 6–13) (p = 0.16). In intrinsic subtype, the normal-like subtype had the fewest fusions (median: 1, IQR: 0, 3) followed by the luminal A (median: 5.5, IQR: 2.75, 10.25), luminal B (median: 9, IQR: 6, 16.5), HER2-enriched (median: 9, IQR:6, 16.5) and basal-like (median 10, IQR: 6, 15.5) intrinsic subtypes, in ascending order (p < 0.05) (Fig. 2b).

Fig. 2: Number of fusions by breast cancer subtypes.
figure 2

Number of fusions according to (a). immunohistochemical breast cancer (BC) subtype and (b) PAM50 intrinsic subtype and (cf) PAM50 intrinsic subtype in each breast cancer subtype (n = 297).

Further intrinsic subtype analysis presented that basal-like subtype had more fusions compared to other intrinsic subtypes in HR + HER2- BC (median number of fusions of basal like subtype in HR + HER2-BC: 17.5, IQR:9.75–18.75) (Fig. 2c) and TNBC (median number:10, IQR: 6 -15) (Fig. 2f) (p < 0.05, respectively). However, in HER2 + BC regardless of HR status, there was no difference in number of fusions according to intrinsic subtype (ps > 0.05, respectively) (Fig. 2d, e).

Other genomic characteristics including homologous recombinant deficiency (HRD) score, tumor mutational burden(TNB) score, and copy number variant (CNV) were also evaluated for association with number of fusions using 126 BCs which being done whole exome sequencing (WES) analysis (Fig. 3a–c and supplementary Table 2 and 3). In this analysis, high HRD score, high TMB score and high CNV burden were positively correlated to number of fusions (p = 0.010, p = 0.003, and p = 0.035, respectively). In subspecific analyses according to BC subtype, CNV burden was associated with number of fusions in HR + HER2- subtype and TNBC whereas HRD was in HR-HER2+ subtype (Fig. 3d–o).

Fig. 3: HRD, TMB and CNV burden according to number of fusions in breast cancer.
figure 3

a Homologous recombinant deficiency (HRD) score, b Tumoral mutational burden (TMB). c Copy number variant (CNV burden) between high (n = 45) and low (n = 81) number of fusions in all breast cancer (BC) (n = 126 which being performed both WTS and whole exome sequencing), d HRD score, e TMB. f CNV burden between high (n = 10) and low (n = 23) number of fusions in HR + HER2- BC, g HRD score. h TMB. i CNV burden between high (n = 7) and low (n = 16) number of fusions in HR + HER2 + BC. j HRD score. k TMB, l CNV burden between high (n = 8) and low (n = 8) number of fusions in HR-HER2 + BC. m HRD score. n TMB. o CNV burden between high (n = 20) and low (n = 34) number of fusions in TNBC.

Frequent fusions in early breast cancer

After filtering, we found 2439 unique fusions (Table 2). Among these events, there were 515 (21.1%) recurrent events, 365 (15.0%) known cancer-related fusions and 131 (5.4%) known BC-related fusions according to public databases including Mitelman and FusionAnnotator, which contained ChimerDB, COSMIC, and TCGA fusions22,23. With regard to chromosomes, chromosome 17 had most fusions followed by chromosome 1, 11, and 8 among 23 chromosomes. In addition, intrachromosomal fusion was detected more frequently than interchromosomal fusion (Table 3). The four chromosomes harboring the most fusions also harbored up to 70% of intrachromosomal fusions. FBXL20, BCAS3, ERBB2, and IKZF3 were the most frequently detected fusion genes in chromosome 17. In total, 77 (3.2%) fusions were known recurrent fusions. The most commonly detected fusion event was FSIP1-AC013652.1 (Supplementary Table 6).

Table 2 Number of known and unknown fusions
Table 3 Number of inter/ intra-chromosomal fusions by chromosome

We also analyzed fusions in chromosomes according to BC subtype and intrinsic subtype (Fig. 4). In BC subtype, the fusions in HR + HER2 + BC subtype mostly occurred in chromosome 17, while the fusions in TNBC were mostly observed in chromosome 1 but evenly distributed in whole chromosomes (p < 0.05) (Fig. 4a, c). In intrinsic subtypes, fusions in the basal-like subtypes mostly occurred in chromosome 1, while the other occurred in chromosome 17, respectively (p < 0.05) (Fig. 4b, d).

Fig. 4: Proportion of fusion events in chromosomes.
figure 4

a The proportion of fusions in chromosomes according to HR + HER2- breast cancer (BC), HR + HER + BC, HR-HER2 + BC and triple negative breast cancer (TNBC), b The proportion of fusions in chromosomes according to luminal A, luminal B, HER2-enriched, basal and normal intrinsic subtype, c Circos plot for fusions according to HR + HER2- BC, HR + HER + BC, HR-HER2 + BC and TNBC, d Circos plot for fusions according to luminal A, luminal B, HER2-enriched, basal and normal intrinsic subtype.

Survival outcomes according to number of fusions

We further evaluated treatment outcomes according to fusions (Fig. 5). Only 197 BC patients having follow up survival data were enrolled in this analysis. For, survival analysis, we divided BCs into two groups according to number of fusions with a 0.6 cut-off value (11 fusions). The 0.6 cut off value was based on log-rank test with consecutive cutoff values in total BC patients and this was numerically 11 fusions (Supplementary Table 7).

Fig. 5: Treatment outcomes and number of fusion events.
figure 5

a Number of fusions between non-pathologic complete response(pCR) (n = 109) and pCR (n = 52) in all subtypes, b Number of fusions between non-pathologic complete response(pCR) (n = 31) and pCR (n = 7) in HR + HER2- BC. c Number of fusions between non-pathologic complete response(pCR) (n = 13) and pCR (n = 13) in HR + HER2 + BC. d Number of fusions between non-pathologic complete response(pCR) (n = 10) and pCR (n = 10) in HR-HER2 + BC and (e) number of fusions between non-pathologic complete response(pCR) (n = 55) and pCR (n = 22) in triple-negative breast cancer (TNBC). f Kaplan-Meier (KM) for event-free survival (EFS) according to high vs. low number of fusions (cut-off value: 0.6) in all subtypes (n = 197). g KM for EFS in HR + HER2- BC (n = 38), h KM for EFS in HR + HER2 + BC (n = 26). i KM for EFS in HR-HER2 + BC (n = 20) and (j) KM for EFS in TNBC (n = 113).

Among these 197 patients, 143 patients in NAC cohort were evaluated for pathologic complete response(pCR) according to fusions (Fig. 5a–e). In this analysis, the number of fusions was not significantly different according to the pCR status with all patients (p = 0.23, Fig. 5a). In BC subtype, higher number of fusions was observed in pCR group compared to non-pCR group (p = 0.031) in HR-HER2 + BC subgroup (Fig. 5d) whereas HER2-enriched intrinsic subtype with pCR had lower fusions compared to those with non-pCR (p = 0.021) (Supplementary Fig. 1C). Moreover, luminal A subtype had higher fusions in those with pCR compared to those without pCR (p = 0.045) (Supplementary Fig. 1B). Otherwise, there was no significant difference of the number of fusions by pCR status in other BC subtypes and intrinsic subtypes (Fig. 5b–e and supplementary Fig. 1B, D)

Furthermore, we performed survival analysis in 197 EBC patients with 7 years of median follow up duration (Fig. 5f–j). The five-year event-free survival (5Y-EFS) rate was 75.6% in all patients (95% confidence interval [CI]: 0.699, 0.819) and the 5Y-EFS rate was 68.1% in the high fusion group (n = 72) and 80.0% in the low fusion group (n = 125) (p = 0.024) (Fig. 5f).

In survival analysis for fusions according to BC subtypes, TNBC with higher number of fusions (n = 43) had a 5Y-EFS of 65.1%, and that with low fusions, 85.7% (n = 70) (p = 0.013), while their 5Y-EFS was 77.9% (95% CI: 0.706, 0.859) (n = 113) (Fig. 5j). In non-TNBCs, they had a trend that high fusions were associated with poor EFS, but statistical significance was not observed (Fig. 5g–i).

Among five intrinsic subtypes, we analyzed EFS according to the fusions in four intrinsic subtypes because normal-like intrinsic subtype was only five. In the basal-like intrinsic subtype (n = 112), the five-year EFS rate was 78.6% (95% CI: 0.713, 0.866). The basal–high fusion group had five-year EFS of 64.6% (n = 40) versus 89.1% in the basal–low fusion group (n = 72) (p = 0.003) (Supplementary Fig. 1H) meanwhile there were no relationship between fusions and EFS among non-basal like intrinsic subtypes (Supplementary Fig. 1E–G)

For validating our data, we evaluated the association between fusions and 5Y-EFS rate in TNBC and basal-intrinsic subtypes using TNBC RNASeq data from Fudan University Sanghai Cancer Center (FUSCC). In total, we could use 115 TNBC RNASeq data from FUSCC TNBC cohort. In this validation cohort, median age of patients at BC diagnosis was 54.0 (IQR: 46.5, 61.0) and only twelve patients were under 40 years of age (p < 0.005) (Fig. 6a). In terms of mean depth of sequencing, FUSCC cohort had lower than our cohort (p < 0.005) and fewer fusions compared to our TNBC (median fusions: 3, IQR: 1, 5) (p < 0.005) (Fig. 6b, c). In terms of other clinical characteristics, we cannot find treatment setting regarding neoadjuvant and adjuvant settings.

Fig. 6: Valitation and comparision between our cohort and FUSCC cohort.
figure 6

Validation study with Fudan University Sanghai Cancer Center (FUSCC) TNBC. Comparison of (a). age between SMC TNBC at diagnosis (n = 113)_ and FUSCC TNBC at surgery (n = 115). b Whole transcriptome sequencing depth in SMC TNBC (n = 113) and FUSCC TNBC (n = 113). c Number of fusions depth in SMC TNBC (n = 113) and FUSCC TNBC (n = 113). d Kaplan-Meier(KM) of event-free survival(EFS) according to SMC (n = 113) and FUSCC TNBCs (n = 115). e KM for EFS of FUSCC TNBC according to high vs. low number of fusions (SMC cutoff value:0.6, n = 115).

There were similar EFS between FUSCC and our cohorts (Fig. 6d). Only three TNBCs had up to eleven fusions and therefore no significant EFS difference was observed (p = 0.304) even though three had lower 5Y-EFS rate compared to others in FUSCC cohort (Fig. 6e). Further analyses using different cut off values of fusions were performed in FUSCC cohort and the results showed consistently that more fusions in TNBCs was the surrogate marker of shorter EFS compared to those with fewer fusions (Supplementary Fig. 3).

Immune status according to fusions

Afterwards, we performed the analysis for the association between ESTIMATE ImmuneScore and the number of fusions. In this analysis, high fusion group had a lower ImmuneScore than low fusion group with all patients (p < 0.001) (Fig. 7a). The TNBC–high fusion group had a lower ImmuneScore (median: 1079, IQR: 514, 1761) than the TNBC–low fusion group (median: 1673, IQR: 1057, 2809) (p < 0.001) (Fig. 7e) but non-TNBC subtype did not have a relationship (Fig. 7b–d). Likewise, basal-like intrinsic subtype had a relationship between ImmuneScore and number of fusions (p = 0.0016, Supplementary Fig. 2D) but there was no relationship in non-basal intrinsic subtypes (Supplementary Fig. 2A–C). In survival analysis for ImmuneScore, high ImmuneScore group had better EFS than low ImmuneScore group (p = 0.002, Fig. 7f). TNBC patients with a high ImmuneScore had better EFS compared to that with a low ImmuneScore (five-year EFS of TNBC with high vs. low ImmuneScore: 91.9% [95% CI: 0.835, 1.00] vs. 71.1% [95% CI: 0.616, 0.820]) (Fig. 7j). This trend was also observed in basal-like intrinsic subtypes (p = 0.019, Supplementary Fig. 2H). Withal, ImmuneScore did not affected EFS in non-TNBCs (Fig. 7g–i) as well as non-basal intrinsic subtypes (Supplementary Fig. 2E–G).

Fig. 7: Survival outcome and ImmuneScore regarding number of fusion events.
figure 7

a ESTIMATE ImmuneScore between high (n = 104) and low (n = 193) fusion events in all subtypes. b ESTIMATE ImmuneScore between high (n = 19) and low (n = 48) fusion events in HR + HER2- BC, c ESTIMATE ImmuneScore between high (n = 15) and low (n = 27) fusion events in HR + HER2 + BC. d ESTIMATE ImmuneScore between high (n = 16) and low (n = 20) fusion events in HR-HER2 + BC and (e) ESTIMATE ImmuneScore between high (n = 54) and low (n = 98) fusion events in triple-negative breast cancer(TNBC), f Kaplan-Meier (KM) for event-free survival (EFS) according to ESTIMATE ImmuneScore in all subtypes. g KM for EFS in HR + HER2- BC, h KM for EFS in HR + HER2 + BC. i KM for EFS in HR-HER2 + BC and (j) KM for EFS in TNBC.

Discussion

In this study, we searched for fusions in tissue samples from 297 EBCs and revealed that higher fusions were associated with shorter EFS in EBC, especially TNBC or basal-like intrinsic subtypes. The median number of fusions was nine and the incidence of fusions varied not only by BC subtype but also by intrinsic subtype. Among these fusions, 40 events were included in both the Mitelman and COSMIC fusion databases. Based on the Mitelman database, 208 of the detected fusions have previously been found in various cancers and 90 specifically in BC. Among these 90 fusions, 58 were found in TNBC, 15 in ER + HER2-, 12 in ER + HER2 + , and 11 in ER-HER2 + BC. By intrinsic subtype, there were 60 fusions in basal-like, 12 in luminal A, 8 in luminal B, and 18 in HER2-enriched subtypes.

Gene rearrangement is not frequently reported in BC, especially EBC24,25. Previous studies on fusions in BC suggested that the HR-HER2 + BC and TNBC subtypes showed more frequent fusions compared with the HR + HER2- and HR + HER2+ subtypes. They also reported only a small number of fusions in BC compared to other types of cancer. In our study, HR + HER2- BC had slightly fewer fusions compared to other BC subtypes. In intrinsic subtype, the basal-like subtype had most fusions followed by the HER2-enriched and luminal B subtypes, whereas luminal A and normal-like subtypes had few fusions.

In addition to the number of fusions, the loci of fusions also depended on intrinsic subtype. In basal-like BC, fusions most commonly occurred in chromosome 1 but we similarly observed fusions in whole chromosomes, whereas other intrinsic subtypes harbored fusions mostly in chromosome 17. A recent study revealed that translocations were caused by oncogene amplification as an early genetic structural alteration event. Specifically, ERBB2 amplification also suggested that ERBB2 translocation was an interchromosomal event26. Our research also indicated that ERBB2 amplification was related to fusions in chromosome 17. However, intrachromosomal events were more frequently observed in our study compared to interchromosomal events. In the TCGA cohort, 1.4% of ERBB2 fusions occurred not in HR-HER2 + BC, but in other subtypes27,28. ERBB2 fusion is observed in non-small-cell lung cancer (NSCLC)29, accounting for 0.3% of ERBB2 fusions. The pan-HER tyrosine kinase inhibitor afatinib has been used to effectively treat NSCLC harboring ERBB2 fusions.

ESR1 fusions were observed in two HR + HER2- BCs and one HR + HER2 + BC. By intrinsic subtype, two were in luminal B and one was in HER2. All were intrachromosomal events and two of three fusions were occurred in ligand binding domain of ESR1. In the TCGA PanCancer Atlas Project, fusions affecting ESR1 were infrequent in BC (0.8%), and counterpart genes varied27. Recent study also suggested most recurrent luminal B subtype enriched fusions were including ESR1 in metastatic setting and 5% of HR positive treatment refractory metastatic BC (MBC) harbored ESR1 fusion30. Previous functional study of ESR1 fusion in HR + MBC suggested that the fusion transcriptome triggered endocrine resistance and promoted metastasis21. Especially, two of three patients with ESR1 fusion had experienced BC recurrence within five years and therefore, ESR1 fusion may have a role of primary resistance mechanism for adjuvant endocrine therapy in HR positive EBC.

We also found two NTRK fusions, CCL28-NTRK1 and NTRK2-BANCR in TNBC. Recent studies of NTRK fusion focused on treatment response to new TRK inhibitors in tumors harboring NTRK fusion31,32. These drugs were effective for treating all types of cancer harboring NTRK fusions. Although NTRK fusion is a rare genetic alteration in BC, TRK inhibitors would likely be similarly effective in such cases.

We found that fusions were associated with EFS in EBC patients. Specifically, there was a significant association between number of fusions and five-year EFS in TNBC and basal-like subtype. Previous genomic studies suggested that tumoral mutation burden was associated with survival outcome, but this also depended on nodal stage33. Previous studies as well as this study have suggested that immune signature is related to survival outcomes of TNBC, but this signature was calculated based on RNA expression data and difficult to reproduce in other TNBC cohorts33,34. Fusions in TNBC were not associated with HRD, TMB burden in our study and therefore fusion itself was associated with EFS in TNBC. In RNASeq data from FUSCC cohort, our finding was also present in validation cohort even though they had fewer fusions in their cohort. The difference of number of fusions between two cohort would depend on depth of sequencing. In spite of this difference characteristics between two cohort, the trend for more fusions shorter EFS was consecutively similar.

Lastly, ImmuneScore was negatively correlated to the number of fusions in TNBC and basal-like intrinsic subtypes. In our study, CNV burden was also related to the number of fusions and this suggested that genomic alterations including CNV burden and SVs might be associated to tumor microenvironment.

Our cohort included high stage EBCs which needed neoadjuvant chemotherapy and young patients who had worse prognosis rather than older patients6. In terms of BC subtype, up to 50% of TNBCs were in this study. Therefore, our cohort had relatively worse survival outcome compared to that of other EBC cohort, although all patients had been received cytotoxic chemotherapy in neoadjuvant or adjuvant setting. In addition, false positive fusion calls may exist even though we used three software for calling fusions and then strictly cut off fusions. Nevertheless, our study suggested that the genomic structural characteristics in EBC with unfavorable survival outcome. In conclusion, we investigated structural variants in tumors from EBC patients. Consistent with previous studies, median number of fusions was lower than ten but TNBC or basal-like intrinsic subtype harbored more fusions rather than other subtypes. In addition, fusions occurred across various chromosomes in TNBC, and survival outcomes were associated with the number of fusions. Further functional validation is warranted to confirm the role of these fusions.

Methods

Tissue collection

We collected BC tissue samples from patients who participated in explorative trials at Samsung Medical Center from December 2013 to June 2020. The institutional review board of Samsung Medical Center approved the study protocol (IRB No: 2022-05-004) and participants provided written informed consent to take part in study (Supplementary Table 1). This study was performed in accordance with the Declaration of Helsinki6,35. In NAC cohort, patients had received NAC (four cycles of adriamycin and cyclophosphamide followed by four cycles of docetaxel) and trastuzumab and/or pertuzumab for HER2 + BC. In YBC cohort, 16 patients had received NAC and 38 patients had undergone curative surgery followed by adjuvant chemotherapy per protocol. Radiotherapy, endocrine therapy and targeted therapy for HER2 + BC had also performed per standard therapeutic guideline.

All available hematoxylin and eosin-stained slides for fresh-frozen tissues were collected. All pathology specimens were reviewed by independent pathologists to determine tumor histology and immunohistochemical (IHC) findings (estrogen receptor [ER] and progesterone receptor [PgR] expression and HER2 overexpression). ER and PgR positivity, the presence of either of which was defined as HR positivity, was defined by Allred scores ranging from 3–8 based on IHC using antibodies to ER (Immunotech, Marseille, France) and PgR (Novocastra Laboratories Ltd., Newcastle upon Tyne, UK), respectively. HER2 status was evaluated using a specific antibody (Dako, Glostrop, Denmark), fluorescent in situ hybridization (FISH), or silver in situ hybridization (SISH). Grade 0/1 HER2 on IHC was defined as a negative result, and grade 3 was defined as a positive result. Amplification of HER2 was confirmed by FISH or SISH if HER2 was rated as grade 2 on IHC. TNBC was defined as a negative result for ER/PgR and HER2.

Whole transcriptome sequencing (WTS)

Total RNA from fresh-frozen tumor tissues was extracted with an RNeasy Mini Kit (Qiagen, Hilden, Germany). Nucleic acid extraction was performed according to the manufacturer’s instructions. The quality and quantity of extracted nucleic acids were evaluated using Nanodrop 8000 UV–Vis spectrometer (Thermo Fisher Scientific, Waltham, MA, USAQubit® 3.0 Fluorometer (Life Technologies, Inc., Carlsbad, CA, USA), and 4200 TapeStation (Agilent Technologies, Inc., Santa Clara, CA, USA). Sequencing libraries were prepared with TruSeq RNA Sample Preparation Kit v2 from fresh-frozen tissues (Illumina, Inc., San Diego, CA, USA), following the manufacturer’s protocols. Paired-end sequencing of the RNA libraries was performed on a HiSeq 2500 Sequencing Platform (Illumina, Inc.).

Fusion detection, ImmuneScore and PAM50 subtyping

Fusion was predicted from RNAseq employing three fusion detection software programs with default parameters: STAR.Arriba v2.0.036, STAR.fusion v1.9.1 (https://github.com/STAR-Fusion), and STAR.SEQR v0.6.737. Reads aligned using STAR-2.7.6a served as input for fusion callers. Sequencing reads were aligned to the human reference sequence hg38. Fusions flagged as red herrings were filtered out based on healthy tissue samples or gene homology databases (GTEx_recurrent_StarF2019, BodyMap, DGD_PARALOGS, Greger_Normal, and Babiceanu_Normal). Read-through fusions, considered artifacts, were excluded. To eliminate false-positive fusions, we removed those with fewer than three supporting reads or without any split reads38. Fusion calls predicted by two or more of the three fusion detection programs were further analyzed. Fusions were annotated as known fusions, as reported in public databases including Mitelman 202322, ChimerDB, COSMIC and TCGA fusions in the FusionAnnotator fusion_lib.Mar2021.dat39 leveraging the CTAT Human Fusion Lib database release v0.3.0. Known fusions in BC were manually reviewed on binary alignment and mapping (BAM) files using Integrative Genomic Viewers (https://software.broadinstitute.org/software/igv/)40. Fusions were visualized as the Circos plot using circlize in the R package (version 0.4.14). ImmuneScore was calculated using ESTIMATE R package (version 1.0.13) to estimate the immune microenvironment scores for each samples with gene expression values TPM (transcript per million). Patients with ImmuneScore higher than 60% of the total 297 samples were grouped as having a high ImmuneScore41. PAM50 intrinsic subtype was performed using Genefu R package (v2.26.1) with the gene expression data42.

Whole exome sequencing (WES)

Pathologists determined tumor purity by reviewing tumor specimens, and samples with low tumor purity (cut-off, 20%) were excluded from sequencing. Genomic DNA was extracted from fresh-frozen tissues using the QIAamp DNA mini kit (Qiagen). Genomic DNA from peripheral blood was extracted using the QIAamp DNA blood maxi kit (Qiagen). Total RNA from fresh-frozen tumor tissues was extracted with an RNeasy mini kit (Qiagen) according to the manufacturer’s instructions. The quality and quantity of extracted nucleic acids were evaluated using the NanoDrop™ 8000 UV–Vis spectrometer (Thermo Fisher Scientific, Waltham, MA, USA), Qubit® 3.0 fluorometer (Thermo Fisher Scientific), and 4200 TapeStation (Agilent Technologies, Inc.).

High-quality gDNA in matched tumor and blood samples was sheared with an S220 ultra-sonicator (Covaris, Inc., Woburn, MA, USA) and used to construct a library with the SureSelect XT Human All Exon v5 and SureSelect XT reagent kit, HSQ (Agilent Technologies, Inc.), according to the manufacturer’s protocol. Libraries were pooled, denatured, and sequenced in 100-bp paired-end mode using the HiSeq rapid SBS kit v2 (200 Cycles) and HiSeq rapid PE cluster kit v2 on the Illumina HiSeq 2500 platform (Illumina, Inc.).

Reads were aligned to the human reference genome (hg19) using the Burrows–Wheeler alignment tool (BWA) v0.7.1743. Sequence alignment and mapping (SAM) files were converted into BAM files using SAMtools v1.6. Duplicate reads were removed using Picard v2.9.4, base quality was recalibrated, and local realignment was optimized using the Genome Analysis toolkit (GATK) v4.0.2.144. SNVs and indels were identified using MuTect2 v4.0.2.1. Copy number alteration was estimated by CONTRA v2.0.445,46.

Tumor mutation burden and homologous recombination deficiency

The TMB (mutation load) was defined as the sum of the number of non-synonymous SNVs and indels. Genomic scar scores, including telomeric allelic imbalance (Telomeric.AI), loss of heterozygosity (HRD-LOH), and the number of large-scale transitions (LST), were determined using the scarHRD R package v0.1.047. The sum of these three scores was referred to as the HRD score and indicated HRD status.

Validation study

To validate our finding of the association between fusions and EFS in TNBC and basal-intrinsic subtypes, we used TNBC whole transcriptome sequencing data from Fudan University Sanghai Cancer Center (FUSCC)48. In total, we could use 115 TNBC RNASeq data from BC cohort (SRA accession number: SRP157974) for fusion detection applying the same softwares and filtering steps for SMC data described above. Recurrence-free survival was estimated with the number of fusions by log-rank test.

Statistical analyses

For survival analysis, EFS was estimated using the Kaplan–Meier method by log-rank test with survminer in the R package (version 0.4.9). EFS was defined as the day between BC diagnosis and the first recurrence events including local and distant metastases, contralateral BC development, and BC specific death. High vs. low cutoff values were investigated by estimating survival differences with log-rank test using consecutive cutoff changes from 0.1 to 0.9 for all patients and various subgroups, and selected which covered the most subgroups (Supplementary Table 9). Patients with a fusion count exceeding 60% of those of patients analyzed were classified as having a high fusion burden. Patients exhibiting an ImmuneScore higher than 60% of those of patients analyzed were categorized as having a high ImmuneScore. The five-year EFS rate was calculated, including a 95% confidence interval (CI). All statistical analyses were conducted using R version 4.1.2. Adjusted p-values, calculated using the false discovery rate, were employed to determine statistical significance. P-values less than 0.05 from Wilcoxon rank sun test, Kruskal-Wallis tests or correlation tests were deemed statistically significant.