Background

The central role of the immune system in cancer therapeutics is increasingly being recognized [1], and cancer immunotherapy, along with surgery, radiotherapy, chemotherapy, and targeted therapy, is becoming a powerful method for treating malignant tumors [2, 3]. Immune checkpoint inhibitors (ICIs), one of the most promising types of immunotherapy, have achieved significant success in a variety of cancers [4]. Hence, an effective scoring system to evaluate patient prognosis and gain insight into tumor immunity is urgently required.

Immunotherapy has also been used to treat patients with breast cancer in recent years, but its therapeutic efficacy is affected by many factors. The composition of the tumor immune infiltrate, consisting of multiple immune cells, is an important determinant of tumor-immune interactions. In several cancer types, tumor mutational burden is a biomarker of ICI treatment efficacy, and neoantigens produced by somatic tumor mutations can be recognized by the host immune system and influence the immunotherapy response of patients. Long non-coding RNAs (lncRNAs), which are more than 200 nucleotides in length and do not encode proteins, play key roles in a wide range of biological and cellular functions. Recent studies have shown that lncRNAs are important elements of the immune system and have attracted considerable attention in cancer immunity. For example, Chao Ni found that lncRNA SNHG16, transferred by breast cancer exosomes, can upregulate the expression of SMAD5 in γδ1 T cells [5]. Lnczc3h7a binds to TRIM25 and activates RIG-I, which can initiate an antiviral immune response following the recognition of pathogenic RNA [6]. Another analysis identified immune-related lncRNAs from The Cancer Genome Atlas (TCGA) and demonstrated their diagnostic and prognostic performance [7, 8].

In this study, we identified 15 ICI-related mRNAs and five immune-related lncRNAs using bioinformatics analysis in multiple databases. Based on these RNAs, an immune score (IS) for breast cancer was developed and found to be associated with survival outcomes. Finally, we explored the connection between IS and immune-related features in TCGA (Fig. 1).

Fig. 1
figure 1

Strategy for identifying ICI-related mRNAs and immune-related lncRNAs in this study. ICI, Immune checkpoint inhibitors

Materials and methods

Cohort dataset collection

RNA row counts and clinical data of the GSE91061 dataset [9] were downloaded from Gene Expression Omnibus (GEO). Data and clinical information for IMvigor 210 were collected using the R package IMvigor210CoreBiologies [10]. Transcriptional profiles of 59 breast carcinoma cell lines based on the Affymetrix HG-U133_Plus 2.0 platform were downloaded from the Cancer Cell Line Encyclopedia project (CCLE) (https://depmap.org/portal/download/) [11]. Based on the Affymetrix HG-U133_Plus 2.0 platform in GEO, we obtained data for 930 patients with breast cancer from seven datasets (GSE16446, GSE20685, GSE20711, GSE42568, GSE48390, GSE58812, GSE88770) with complete survival data (Supplementary Table 1) and 152 transcriptional profiles of 19 immune cell types in 13 datasets (Supplementary Table 2). In TCGA Breast Cancer (BRCA) project, masked copy number segments and RNASeq expression (FPKM, counts) data were downloaded using the TGCAbiolinks R package [12]. Masked SNV data were downloaded from TCGA program1. The immune-related features and clinical data of TCGA patients with breast cancer were downloaded from the Genomic Data Commons database [13].

Gene signatures for the gene set enrichment analysis selection

We selected gene signatures (Supplementary Table 3) to identify four immune cell populations (activated CD4 T cells, cytotoxic cells, activated CD8 T cells, and B cells) from the supplemental materials of four different articles [14,15,16,17]. Hallmark gene sets were acquired from the Molecular Signatures Database3 (MSigDB) version 7.2 [18, 19].

Immune-related LncRNA and ICI-related mRNA collection

According to the HG-U133_Plus_2 Annotations file (Release 36), we sorted the ‘Ensembl gene IDs’ and ‘Refseq IDs’ corresponding to the microarray probes. Annotation files of GENCODE and Refseq were downloaded from their official websites and used to screen for lncRNAs. Finally, we obtained 2145 unique lncRNAs corresponding to 2957 probe sets for further analysis (Supplementary Table 4).

The raw data (.cel files) of CCLE and immune cells were used for robust Multi-array Average (RMA) normalization [20]. Among the 2957 probes, 35 genes expressed in the top 5% of the 19 normal immune cells and the bottom 5% of the 59 breast carcinoma cell lines were selected as immune-related lncRNAs.

The RNA sequencing row counts data of GSE91061 and IMvigor 210 were transformed using the ‘limma::voom’ algorithm [21], complete response(CR) and partial response (PR) were judged as effective immunotherapy, while stable disease (SD) and progressive disease (PD) were defined as ineffective immunotherapy. The immune-related genes were obtained from the Immport website (https://www.immport.org/) [22]. According to the effective and ineffective treatments, 45 ICI-related mRNAs were selected using logistic regression analysis performed on the two datasets (Supplementary Table 5).

Construction of IS

GEO expression profiles of GEO (.cel files) of breast cancer patients were processed using RMA, and batch effect reduction was performed using the ComBat function from the R package sva. We used the GSE20685 set as the test set and the rest as the training set. Univariate Cox regression analysis was performed to screen significant prognostic RNAs (spRNAs) related to survival from the ICI-related mRNAs and immune-related lncRNAs in the training set. The coefficient for each spRNA was obtained using a multivariate Cox regression. IS was calculated as follows: Sum of coefficient × expression level of spRNAs.

Statistical analyses

Most statistical analyses were performed using the R program (version 4.0.2) with default arguments unless mentioned otherwise. Shapiro-Wilk test was used to test normality. Wilcox test was performed to verify statistical significance between the two groups, whereas the Kruskal-Wallis test was applied to test for multiple groups.

Differential expression analysis was performed using the R package DESeq2 [23]. Survival and survminer packages were applied for survival analysis, and the time-dependent receiver operating characteristic (ROC) and area under the curve (AUC) were determined using the R package “survival ROC”. The ESTIMATE package was used to compute the stromal score, IS, ESTIMATE score, and tumor purity. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed using the R package “clusterProfiler” [24], and the “GSVA” package [25, 26] was used for single-sample gene set enrichment analysis (ssGSEA). Single nucleotide polymorphism (SNP) analysis and visualization of results were performed using the package “maftools” [27]. P-values were two-sided, and statistical significance was set at P < 0.05.

Results

Development and evaluation of IS in the GEO cohort

After univariate Cox analysis, we found that the expression of five lncRNAs and 15 mRNAs were significantly correlated with the OS of patients with breast cancer in the training dataset. We then obtained the coefficient of each gene using multivariate Cox regression and calculated the IS. The patients in the training set were divided into low- and high-score groups according to the median IS.

Survival curves (Kaplan-Meier estimates) revealed that OS in the low-score group was significantly higher than that in the high-score group (HR, 0.37; 95% confidence interval [CI], 0.28–0.51; log-rank P < 0.001) (Fig. 2A). The AUC of IS obtained from OS was 0.72 at 3 years, 0.716 at 5 years, and 0.692 at 10 years (Fig. 2B).

Fig. 2
figure 2

The IS in GEO cohort. (A) Kaplan–Meier survival curves of OS according to IS groups in the GEO train set. (B) ROC curves at 3, 5, and 10 years of OS according to IS groups in the GEO train set. (C) Kaplan–Meier survival curves of OS according to IS groups in GEO test set. (D) ROC curves at 3, 5, and 10 years of overall survival according to IS groups in the GEO test set. (E) The ESTIMATE results in the GEO cohort. IS, immune score; OS, overall survival; GEO, Gene Expression Omnibus; ROC, receiver operating characteristic; AUC, area under the curve

The prognostic value of IS was further tested using GSE20685 without re-estimating the parameters. Similarly, patients in the high-score group had a significantly shorter OS than those in the low-score group (HR, 0.26; 95% CI,0.16–0.51; log-rank P < 0.001) (Fig. 2C). AUC was 0.798, 0.772, and 0.671 at 3, 5, and 10 years of OS, respectively (Fig. 2D).

In the whole GEO cohort, patients with low-score breast cancer had significantly higher stromal scores (Wilcoxon test, P < 0.001), IS (Wilcox test, P < 0.001), ESTIMATE scores (Wilcoxon test, P < 0.001), and lower tumor purity (Wilcox test, P < 0.001) (Fig. 2E) compared to those with high-score breast cancer.

Prognostic power evaluation of IS in TCGA cohort

Kaplan-Meier survival curves revealed that OS was significantly prolonged in the low-score group (HR, 0.38; 95% CI, 0.27–0.53; log-rank P < 0.001) (Fig. 3A) compared to that in the high-score group. The AUC of the IS were 0.753, 0.765, and 0.711 at 3, 5, and 10 years, respectively (Fig. 3B).

Fig. 3
figure 3

The IS in the TCGA cohort. (A) Kaplan–Meier survival curves of OS according to IS groups in TCGA cohort. (B) ROC curves at 3, 5, and 10 years of OS according to IS groups in the TCGA cohort. (C) Kaplan–Meier survival curves of PFS according to IS groups in TCGA cohort. (D) ROC curves at 3, 5, and 10 years of PFS according to IS groups in the TCGA cohort. (E) Forest plots of OS. (F) Forest plots of PFS. IS, immune score; OS, overall survival; TCGA, The Cancer Genome Atlas; ROC, receiver operating characteristic; AUC, area under the curve; PFS, progression free survival; AIC, Akaike information criterion

We used progression-free survival (PFS) as a recurrence measure to examine the effectiveness of IS for predicting recurrence risk in TCGA. PFS in the low-score group was significantly higher than that in the high-score group (HR,0.45; 95% CI, 0.33–0.64; log-rank P < 0.001) (Fig. 3C). The AUC of the IS was 0.672, 0.62, and 0.683 at 3, 5, and 10 years, respectively (Fig. 3D).

Moreover, to evaluate the independent prognostic effect of IS, we used a multivariate Cox regression model to adjust for other factors, including age and TNM stage, which indicated that IS remained a significant and independent prognostic indicator in the TCGA cohort (Fig. 3E-F).

Different immune-related features between low- and high-score patients in TCGA cohort

First, we examined the distribution of IS in the five immune types of patients with breast cancer. A substantial difference was observed between the five immune subtypes in the Kruskal-Wallis test (P-value < 2.20E-16), as shown in Fig. 4A, and patients with type C4 (lymphocyte-depleted) had a higher IS than the other four subtypes. A molecular subtype imbalance was also found, with 73% of HR-/HER2 + tumors displaying a high IS subtype compared to 44% of TNBC tumors (Fig. 4B).

Fig. 4
figure 4

Exploration of the role of the IS in the TCGA cohort. (A) Raincloud Plot shows the comparison of IS between the different immune subtypes. (B) Sankey plot showing the distribution of the different groups in C1–C6 subtypes and molecular subtypes. (C) The box plots of the CTLA4, PD-1 and PD-L1 for two IS groups. (D) Lollipop plot showing the comparison of immune-related features between the low-score and high-score groups. The length of the stick represents the difference between the medians of the features in the high and low groups. (E) The Waterfall Plot displays the distribution of SNPs in IS-relevant groups, and the left chat represents the probability of CNVs events (amplification and deletion). IS, immune score; TCGA, The Cancer Genome Atlas; FPKM, Fragments Per Kilobase Million; HRD, homologous recombination deficiency; SNP, single nucleotide polymorphism; CNV, copy number variation

Immune checkpoint molecules like CTLA4, PD-1, and PD-L1 are crucial in modulating immune responses and are key targets in cancer immunotherapy. PD-1/PD-L1 inhibitors promote antitumor immunity by blocking inhibitory signals, enhancing T-cell activity against cancer cells. CTLA-4 inhibitors boost T-cell activation and proliferation. In breast cancer, their expression influences the effectiveness of immunotherapy and impacts patient outcomes [28]. Therefore, we investigated whether IS is associated with three major molecules of the immune checkpoint: CTLA4, PD-1, and PD-L1. As shown in Fig. 4C, CTLA4, PD-1, and PD-L1 expression (FPKM) appeared to be higher in the tumors of the low-score group, and according to the Wilcoxon test, the differences in CTLA4, PD-1, and PD-L1 expression between the two IS subtypes were statistically significant.

In the following analysis, we explored the differences in terms of the composition of the tumor immune infiltrate, somatic/germline variation, and immunogenicity between the IS-based subtype of TCGA dataset (Fig. 4D). Compared to the high-score group, the low-score group had a higher stromal, leukocyte, and tumor infiltrating lymphocytes (TIL) regional fraction, which was similar to the GEO cohort. Potential factors that presented tumor somatic or germline mutations, including aneuploidy score, number of segments, homologous recombination defects (HRD), intratumor heterogeneity, fraction altered, silent mutation rate, and non-silent mutation rate, were compared between the high-score group and the low-score group. The median values for other variables in the low-score group were substantially lower than those in the high-score group, except for intratumor heterogeneity. Additionally, the SNV neoantigen levels were higher in the high-score group than those in the low-score group.

Finally, because of variations in mutations between the two groups, we analyzed the mutation annotation files to display the distribution of SNPs in the two groups (Fig. 4E).

Identification of IS-related biological pathways and processes

To explore the underlying mechanism of the prognostic signature, we performed a differential expression gene (DEG) analysis between the high- and low-score groups. Significant DEGs (836) were screened after filtering with DESeq2 (|log2FC| > 1 and P-value change < 0.01), 321 of which were upregulated and 514 were downregulated in the high-score group compared to the low-score group (Fig. 5A).

Fig. 5
figure 5

Functional characteristics of high-score and low-score groups. (A) Volcano plot of DEGs. (B) GSEA analysis of the DEGs between high-score and low-score groups. (C) Significantly enriched GO terms pathways for significant DEGs. (D) Significantly enriched KEGG pathways for significant DEGs. (E) Heatmap showing the activation status of the biological processes in different groups. DEG, differentially expressed genes; GSEA, gene set enrichment analysis; GO, Gene Ontology; BP, Biological Process; CC, Cellular Component; MF, Molecular Function; KEGG, Kyoto Encyclopedia of Genes and Genomes

GO and KEGG functional enrichment demonstrated that expression alterations of these genes could not only activate immune relevant pathways such as ‘humoral immune response’, ‘T cell receptor complex’, ‘T cell receptor signaling pathway’ and ‘lymphocyte differentiation’ but also tumor progressions like ‘positive regulation of cell activation’, ‘positive regulation of cell killing’ and ‘NF-kappa B signaling pathway’ (Fig. 5B).

GSEA was used to demonstrate that the gene sets of B cells, activated CD8 T cells, activated CD4 T cells, and cytotoxic cells were substantially enriched in the low-score group (Fig. 5C-D). To gain insight into the biological processes, we conducted ssGSEA using hallmark gene sets. The heatmap (Fig. 5E) showed that samples with low IS had high ssGSEA scores in immunity- and inflammatory response-related processes such as ‘TNFa signaling via NFkb’, ‘IL6 jak stat3 signaling’, ‘inflammatory response’ and ‘IL2 stat5 signaling’.

Discussion

The prognostic signatures of immune-related genes have recently received increasing attention [29, 30]. Some of these studies focused on the immune microenvironment using linear models or rank-based models (such as CIBERSORT or ssGSEA) to approximate the relative distributions of immune cells from the gene expression profiles of bulk samples. The model, based on the abundance of different immune cell populations, is a valuable method for the investigation of the immune environment, the effectiveness of immunotherapy, and the prediction of survival [14, 31, 32]. However, model construction requires full mRNA sequencing or at least a microarray, which is difficult to perform in clinical practice due to the high cost.

Of the 20 genes included in the model in this study, 15 were ICI treatment-associated mRNAs and five were immune cell-associated lncRNAs, the vast majority of which were associated with breast cancer. In mouse models of spontaneous breast cancer metastasis and patients with breast cancer with lung metastasis, Shani et al. discovered that IL33 expression is elevated in metastasis-associated fibroblasts [33]. Additionally, Wang et al. demonstrated that miR-325-3p promotes proliferation, invasion, and EMT of breast cancer cells by directly targeting S100A2, highlighting the significance of the miR-325-3p/S100A2 axis in breast cancer progression [34]. According to one study [35], DPYSL2 deletion significantly reduces the tumor growth rate, metastasis, invasion, and migration of mesenchymal-like breast cancer cells. According to Walen et al. [36]. , CCL5 promotes the growth of CCR5-expressing macrophages, which may help deposit collagen in recurrent tumors. The TNF-CCL5-macrophage axis may be effectively blocked to prevent the recurrence of breast cancer. Song et al. found that [37] LINC01133 expression is significantly downregulated in breast cancer samples and is linked to disease development and poor prognosis. Additional research has revealed that LINC01133 inhibits invasion and metastasis in breast cancer both in vitro and in vivo by attracting EZH2 to the SOX4 promoter and suppressing SOX4 expression. However, lncRNA AL391807.1, MSC-AS1, and mRNA RASGRP1 have not been studied in breast cancer, and further investigation of the relationship and mechanism of these genes in breast cancer is needed. Furthermore, while our findings are specific to breast cancer, there is potential for similar immune-related gene signatures to be explored in other women’s cancers, such as ovarian and endometrial cancer [38, 39], which may offer valuable insights for prognostic predictions and personalized treatment strategies.

Thorsson conducted an extensive immunogenic study of more than 10,000 tumors containing 33 different types of cancer, using data collected from TCGA [13]. In this study, using five immune expression signatures (macrophages/monocytes, overall lymphocyte infiltration, TGF-β response, IFN-γ response, and wound healing), they classify solid tumors into six major immune subtypes as follows: C1 (wound healing), C2 (IFN-γ dominant), C3 (inflammatory), C4 (lymphocyte-depleted), C5 (immunologically quiet), and C6 (TGF-β dominant). According to this approach, BRCA can be classified into five subtypes (C1, C2, C3, C4, and C6). We used Thorsson outcomes to investigate the association between IS and immune-related features. In this study, the regional TIL fraction was estimated by reviewing digitized TCGA hematoxylin and eosin-stained slides, and the leukocyte fraction was assessed via the detection of DNA methylation probes, neither of which was determined using RNA expression; however, there were substantial differences between the two IS subgroups. In addition, HRD, number of segments, fraction altered, and SNV neoantigen levels derived from genomic data were higher in the high-score group than those in the low-score group. In conclusion, patients in the low-score group had higher infiltration of immune cells, whereas those in the high-score group had a higher number of mutations and de novo antigens.

To further investigate the biological functional differences between the two groups, we identified DEGs between the two groups. We performed GO and KEGG enrichment analyses using significant DEGs, and of the top 20 significant GO and KEGG enrichment pathways, most were immune-related. We ranked the genes by fold change to perform GSEA analysis and found that B cells, activated CD8 T cells, activated CD4 T cells, and cytotoxic cells were more enriched in the low-score group.

This study has some limitations. First, this research relies purely on bioinformatics, and subsequent studies should further investigate the molecular mechanisms of IS. Second, the establishment of IS was partially based on the ICB treatment profile, but there is no large-scale evidence available to verify the prediction of immunotherapy efficacy.

Conclusions

Our study introduces a robust immune score (IS) derived from immune checkpoint inhibitor-related mRNAs and lncRNAs, providing a predictive tool for prognosis in breast cancer patients. This immune score correlates strongly with survival outcomes, where a high IS predicts poorer prognosis and a low IS suggests beneficial immune response and better survival. The findings advocate for the integration of IS in clinical settings to refine prognostic assessments and guide personalized immunotherapy strategies, thereby improving the management and treatment outcomes of breast cancer patients.