1 Introduction

Urological malignancies are commonly regarded as a significant threat to the health of middle-aged and elderly men. Among them, bladder cancer (BLCA), ranking tenth among malignant tumors worldwide, exhibits increasing incidence and mortality rates over the years [1]. Clinically, BLCA can be categorized into non-muscle-invasive BLCA and muscle-invasive BLCA [2]. Approximately 75% of BLCA belong to the non-muscle-invasive type, for which transurethral resection of bladder tumor (TURBT), intravesical Bacillus Calmette-Guérin (BCG) therapy, and chemotherapy have demonstrated favorable therapeutic effects [3]. However, the 5-year recurrence rate of non-muscle-invasive BLCA remains high at 50–70%, leading to challenges such as frequent recurrence and high metastasis risk with limited subsequent treatment options [4]. With the deepening understanding of BLCA, new approaches such as immunotherapy, targeted therapy, and antibody–drug conjugates have shown promising efficacy. Particularly, the advent of immunotherapy has revolutionized the treatment landscape of BLCA, with immune checkpoint inhibitors (ICIs) such as PD-1/L1 and CTLA-4 monoclonal antibodies ushering in a new era of BLCA immunotherapy [5]. As more the positive data of immunotherapy clinical trials, immunotherapy has significantly reduced the risk of disease progression or death in advanced or metastatic urothelial carcinoma patients, resulting in a marked improvement in overall survival (OS) rates [6]. However, the occurrence of primary and acquired resistance remains a challenge, and some advanced BLCA patients fail to derive significant benefits from immunotherapy.

The tumor microenvironment (TME) is a complex system composed of tumor cells, tumor-infiltrating immune cells (TIICs), and extracellular matrix components, playing a crucial role in tumor development. It not only provides nutrients and growth conditions for tumor cells but also regulates immune escape and drug resistance of tumor cells [7,8,9]. TIICs are a key component of the TME, possessing tumor recognition and attack capabilities, and playing important roles in immunotherapy. TIICs are heterogeneous, exhibiting functional and phenotypic plasticity, and may exert both pro-tumor and anti-tumor effects. TIICs enhance the efficacy of immunotherapy and reduce tumor recurrence and metastasis through mechanisms such as direct attack on tumor cells, formation of immune memory, regulation of immune cell subpopulations, and blockade of immune inhibitory factors [10]. However, the presence of regulatory T cells (Tregs) and tumor-associated macrophages may release inhibitory cytokines such as TGF-β and IL-10, attenuating the activity of tumor-infiltrating T cells [11]. Additionally, tumor cells may evade immune surveillance by reducing tumor antigen expression or altering antigen structure, as well as by the accumulation of myeloid-derived suppressor cells (MDSCs), thereby promoting the occurrence of immunotherapy resistance [12].

With the rapid development of bioinformatics technologies, genome sequencing and high-throughput sequencing technologies have not only reduced the cost of TME detection but also enabled in-depth understanding of the TME [13]. Meanwhile, machine learning (ML) algorithms and artificial intelligence have become important tools for analyzing complex biomedical data, discovering therapeutic targets, and predicting clinical outcomes [14]. Based on the development of these technologies, multiple evaluation models for diagnosis, prognosis, or therapeutic sensitivity have been constructed and applied in clinical practice, including MSINet, a transfer learning model based on MobileNetV2 architecture, which accurately classifies the microsatellite instability status in colorectal cancer tissue slice images with an AUC of 0.93 [15]. DeepPATH, based on the Inception-v3 architecture, is a deep learning model used for three-class classification of lung tissue WSI (normal, lung adenocarcinoma, and lung squamous cell carcinoma), with a high AUC of 0.97 [16].

Therefore, in this study, we developed a signature related to TIICs based on analysis of single-cell RNA sequencing (scRNAseq) data and bulk RNA sequencing data using 20 ML algorithms. We systematically explored the performance of the TIIC signature in predicting prognosis and immunotherapy response in BLCA patients.

2 Methods

2.1 Collection and processing of the BLCA transcriptome data

In constructing the model, we opted for BLCA RNA expression profiles and corresponding clinical data from the TCGA database (https://portal.gdc.cancer.gov/, n = 406). To validate the model's stability and accuracy, we utilized datasets from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) with sample sizes exceeding 50, including GSE160693 [17] (n = 52), Xiangya_Cohort (GSE188715, n = 56) [18, 19], and GSE176307 (n = 85) [20] as the validation set for RNA-seq. All data were transformed into TPM format, except for GSE160693, which was converted to log2 format for subsequent analysis. Additionally, we employed BLCA microarray data from GEO datasets with sample sizes > 50, including GSE13507 (n = 165) [21, 22], GSE31684 (n = 93) [23], GSE32894 (n = 224) [24], and GSE48075 (n = 73) [25], as the validation set. We conducted data normalization on the microarray data using the normalizeBetweenArrays function from the "limma" package [26]. Furthermore, immunotherapy data were obtained from IMvigor210 (along with data predicted by TIDE online), sourced from the "IMvigor210CoreBiologies" [27] R package.

2.2 Collection and processing of the BLCA scRNA-seq data

The single-cell dataset originated from the GEO database GSE169379, comprising 25 BLCA tumor samples, totaling 25 samples. Data analysis was performed using R software (version 4.1.3), with "Seurat" [28] package chosen for analysis. During data quality control, cells with mitochondrial content exceeding 10% were excluded, and restrictions were applied to UMI counts (200–20,000) and gene counts (200–8000) per cell. Data processing included normalization, selection of highly variable genes (2000 genes), and removal of cell cycle effects (using parameters vars.to.regress = c ("S.Score", "G2M.Score")), corresponding to functions NormalizeData, FindVariableFeatures, and ScaleData in the "Seurat" package, respectively. To address batch effects, the harmony method was employed. Subsequent analysis involved dimensionality reduction and clustering using UMAP, t-SNE, and Louvain clustering algorithms, all implemented within the "Seurat" package. Differential gene expression between clusters or cell types was calculated using the FindAllMarkers function, with parameters set to p-value less than 0.05, log2 fold change greater than 0.25, and expression fraction greater than 0.1.

2.3 Obtaining TIIC-related genes

Through comprehensive analysis at the scRNAseq level of immune cells, tumor cells, and bulk level of BLCA cancer tissues, we have developed a computational framework based on multiple algorithms for identifying TIIC scores. The specific steps are as follows:

Determining Potential Immune-Related RNAs: Identifying the top 15% of RNAs based on expression values as potential immune-related RNAs.

Using Tissue Specificity Index (TSI) [29] to Determine Potential Immune-Related RNAs: TSI is calculated using the formula:

$${\text{TSI}}_{{{\text{RNA}}}} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {1 - x_{RNA,i} } \right)}}{N - 1}$$

where N represents the number of immune cell types and xRNA, i represents the expression intensity of immune cell i for RNA based on the normalized maximal expression value of RNA in each cell type. TSI ranges from 0 to 1, where TSI = 0 defines immune cell-general RNA and TSI = 1 defines immune cell-specific RNA. RNAs with high expression across all immune cell types are classified as immune-related universal RNAs (iuRNA).

Defining TIIC-RNA: iuRNAs significantly upregulated in immune cell types and downregulated in tumor cells are defined as TIIC-RNA.

Utilizing ML for Classification: Employing ML algorithms including Boruta, eXtreme gradient boosting (Xgboost), least absolute shrinkage and selection operator regularized logistic regression (LassoLR), support vector machine (SVM), random forest (RF), and prediction analysis for microarrays (Pamr) to screen for the most valuable TIIC-RNAs.

2.4 Construction of TIIC-signature by machine learning-based ensemble methods

The generation of TIIC-signature using machine learning-based ensemble methods involves the following steps:

Screening Candidate Prognostic TIIC-RNAs through Univariate Cox Proportional Hazards Regression Analysis.

Further Evaluating the Importance of Prognostic TIIC-RNAs using Three ML Classification Algorithms, including Random Survival Forest (RSF), Lasso Regularized Cox Regression (LassoCox), and Likelihood-Based Boosting (CoxBoost) Cox Models.

Processing the Scores using 20 ML Algorithms, which include RSF, Conditional Random Forests (CForest), LassoCox, Elastic Net Regression (Enet), Ridge Regression, Gradient Boosting with Regression Trees (BlackBoost), Regression for Parametric Survival Model (SurvReg), Conditional Inference Trees (CTree), Cox Proportional Hazards Model (CoxPH), Oblique Random Survival Forest (ObliqueRSF), Stepwise Cox Model (StepwiseCox), Survival Support Vector Machine (SurvivalSVM), Generalized Boosted Regression Models (GBM), Ranger, Partial Least Squares Regression for Cox Models (PlsRcox), Gradient Boosting with Component-Wise Linear Models (GlmBoost), Supervised Principal Components (SuperPC), Akritas Conditional Non-Parametric Survival Estimator (Akritas), CoxBoost, and Recursive Partitioning and Regression Trees (Rpart), aiming to determine the most reliable model based on comprehensive Concordance index (C-index).

Development of TIIC Signature Score based on Prognostic TIIC-RNAs using superPC Algorithm. We then plot the receiver Operating curve (ROC) to analyze the efficacy of the model, and the area under the curve (AUC) greater than 0.6 is considered to have good efficacy.

2.5 Cell annotation analysis

We first annotated cells using the following cell markers: Epithelial Cell Markers ("EPCAM", "KRT18", "KRT19", "CDH1"), Fibroblast Cell Markers ("DCN", "THY1", "COL1A1", "COL1A2"), Endothelial Cell Markers ("PECAM1", "CLDN5", "FLT1", "RAMP2"), T Cell Markers ("CD3D", "CD3E", "CD3G", "TRAC"), NK Cell Markers ("NKG7", "GNLY", "NCAM1", "KLRD1"), B Cell Markers ("CD79A", "IGHM", "IGHG3", "IGHA2"), Myeloid Cell Markers ("LYZ", "MARCO", "CD68", "FCGR3A"), Mast Cell Markers ("KIT", "MS4A2", "GATA2"). Based on these cell markers, we conducted a series of graphical analyses, including t-SNE plots and violin plots of cell markers.

We then segregated the clustering of immune cells and performed cell annotation analysis separately. For automated annotation, we employed the Sc-Type software.

2.6 Functional annotation of TIIC signature

We utilized multiple algorithms to quantify immune infiltrates in tumors, including: 6 immune cell types via Tumor Immune Estimation Resource (TIMER) algorithm [30], 22 immune cell types via single-cell gene set enrichment analysis (ssGSEA) algorithm [31], 10 immune cell types via Microenvironment Cell Populations-counter (MCPcounter) algorithm [32], Estimation of Stromal and Immune cells in Malignant Tumors using Expression data (ESTIMATE) algorithm [33]. Additionally, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using Gene Set Variation Analysis (GSVA) [34] and Gene Set Enrichment Analysis (GSEA), as well as enrichment analysis using Metascape [35]. Furthermore, we quantified 114 metabolic pathways from previous literature using GSVA and quantified metabolic pathways from the KEGG database using GSVA as well.

2.7 Prediction of immunotherapy response

For the prediction of immunotherapy response, we collected multiple datasets, including Nathanson (melanoma) [36], GSE35640 (melanoma) [37], GSE91061 (melanoma) [38], GSE78220 (melanoma) [39], IMvigor210 (urothelial carcinoma, UC) [27], Braun (renal cell carcinoma, RCC) [40], GSE179351 (colorectal adenocarcinoma and pancreatic adenocarcinoma, COAD and PAAD) [41], GSE165252 (esophageal adenocarcinoma, ESCA) [42], GSE103668 (triple-negative breast cancer, TNBC) [43], and GSE126044 (non-small cell lung cancer, NSCLC) [39], among others. In each dataset, we computed the TIIC signature score to predict the response to immunotherapy. Additionally, we utilized the Tumor Immune Dysfunction and Exclusion (TIDE) online analysis tool (http://tide.dfci.harvard.edu/) to predict the immune response and scores in TCGA dataset.

2.8 Gene differential analysis and GSEA analysis

To assess the functional characteristics of different TIIC groups, we conducted gene differential analysis using the "limma" package for two groups. The criteria for selecting differentially expressed genes were a p-value less than 0.05 and an absolute value of log2FC greater than 0.5. Subsequently, we performed GSEA analysis on the upregulated genes in both TIIC groups using the "clusterProfiler" package [44]. In the GSEA analysis, we utilized relevant gene sets from the KEGG and GO Biological Process (GOBP) pathways from the MSigDB database [45]. Enrichment was considered significant if the p-value was less than 0.05 after Benjamini–Hochberg correction. Finally, we visualized the results using the "enrichplot" package.

2.9 Comparison of genetic variants landscape

We utilized the "maftools" package [46] in R to operate mutation data and examined the differences in mutation burden between the high-risk and low-risk groups. Through maftools, we presented waterfall plot results of the top 30 genes in both high-risk and low-risk groups and conducted chi-square tests to analyze differences in gene mutation frequencies between the two groups. For Copy Number Variation (CNV) data, we processed it using the Gistic 2.0 software. Initially, we identified significantly amplified and deleted chromosomal segments and evaluated differences in CNVs on chromosomes. We also calculated the fraction of genome alteration (FGA), the fraction of genome gained (FGG), and the fraction of genome lost (FGL). Finally, we visualized these CNV results using the "ggplot2" package.

2.10 Statistical analysis

All data processing, statistical analysis, and plotting were performed using R 4.1.3 software. We assessed the correlation between two continuous variables using Pearson correlation coefficient. For categorical variables, we compared them using the chi-square test, while for continuous variables, we used either the Wilcoxon rank-sum test or the t-test for comparison. We used the "survminer" package to determine the optimal cutoff value and conducted Cox regression and Kaplan–Meier analysis using the "survival" package. In this study, we defined p < 0.05 as statistically significant.

3 Results

3.1 Identification of TIIC-RNA at single-cell level

Based on the BLCA scRNAseq dataset, we identified BLCA cells and six types of microenvironmental cells (Fig. 1A). We further selected BLCA cells and four types of immune cells for analysis (Fig. 1B). We also displayed the t-SNE plot of immune cells and BLCA cells (Fig. 1C), the top 15% of RNA expression levels in each immune cell were identified as potential immune-related RNAs for that immune cell type, totaling 4,743 RNAs. Based on TSI scores (TSI < 0.25), we further determined 2,807 immune-related RNAs as IURNA. The differentially expressed genes (DEGs) in immune cells are shown in Fig. 1D, which were used to validate the definition of cell populations. We calculated the differentially expressed genes between immune cells and BLCA cells, as shown in Fig. 1E. Compared to BLCA cells, 309 significantly upregulated DEGs in immune cells were defined as TIIC-RNA. We applied six ML algorithms for classification, including Boruta, Xgboost, LassoLR, SVM, RF, and Pamr, to identify 171 most valuable TIIC-RNA based on the previously defined TIIC-RNA (Fig. 1F).

Fig. 1
figure 1

Identifying TIIC-RNAs at the single-cell level. A t-SNE plot identifying labeled BLCA cells and microenvironment cells. B t-SNE plot identifying labeled BLCA cells and 4 types of immune cells. C t-SNE plot identifying immune cells and BLCA cells. D Vlnplot showing differential gene expression among different immune cells. E Volcano plot showing differential gene expression between two TIIC signature groups. F Venn diagram displaying the intersection genes identified by 6 ML algorithms for classification

3.2 Construction of the TIIC signature

We conducted univariate Cox proportional hazards regression analysis to investigate the prognostic value of TIIC-RNA in OS of BLCA patients. The results identified 29 TIIC-RNA in the TCGA dataset (Fig. 2A). Furthermore, we employed three ML algorithms for survival analysis, including CoxBoost (Fig. 2B), LassoCox (Fig. 2C, D), and random forest (Fig. 2E, F). 11 prognostic TIIC-RNA were determined (Fig. 2G). Subsequently, based on these 11 prognostic TIIC-RNA, we used 20 ML algorithms for scoring and determined the most reliable model based on the comprehensive C-index of external validation datasets, with the performance of the superPC algorithm being optimal among the 20 ML algorithms (Fig. 2H). TCGA-BLCA patients with high TIIC signature scores exhibited poorer survival outcomes in the TCGA dataset (p < 0.05, Fig. 2I). TimeROC curves quantifying 1–5 year OS with AUC values demonstrated the prognostic value of TIIC signature scores in TCGA-BLCA (0.719, 0.706, 0.713, 0.702, 0.693, respectively). AUC values for other datasets are shown in the figure (Fig. 2J).

Fig. 2
figure 2

Developing a risk signature based on machine learning. A Univariate Cox regression analysis on 29 overlapping genes. B Dimensionality reduction of 29 prognostic genes using CoxBoost algorithm. C, D Dimensionality reduction of 29 prognostic genes using Lasso algorithm. E, F Dimensionality reduction of 29 prognostic genes using random survival forest algorithm. G Venn diagram displaying the intersection prognostic genes identified by three survival ML algorithms. H Out of 20 machine learning methods, superPC was selected as one of the methods for TIIC signature modeling. I Kaplan–Meier survival curves showing the association between TIIC signature scores and overall survival (OS) in TCGA, GSE48075, GSE31684, GSE13507, GSE160693, GSE176307,Xiangya_Cohort and GSE32894 datasets. J Time-dependent ROC curves showing the association between TIIC signature scores and 1, -5 year OS in TCGA, GSE48075, GSE31684, GSE13507, GSE160693, GSE176307, Xiangya_Cohort and GSE32894 datasets

3.3 Comparison of prognostic value between TIIC signature and previous features

In the TCGA dataset, we observed significant correlations between TIIC signature scores and survival status, tumor stage, and TNM staging system (Fig. 3A). Furthermore, in the TCGA dataset, TIIC signature scores showed better performance in terms of age, gender, tumor stage, and TNM staging system C-index (Fig. 3B). To further evaluate the prognostic performance of TIIC signature scores, we included 52 prognostic models reported in other literature and compared the C-indexes in TCGA-BLCA and other validation datasets (Fig. 3C). These 52 prognostic models demonstrated associations with various biological features. Our TIIC model exhibited better performance in both TCGA-BLCA and other validation datasets compared to most other published models.

Fig. 3
figure 3

Prognostic value of TIIC signature scores. A Circos plot showing different clinical factors in two TIIC signature groups. B Concordance index (C-index) of TIIC signature scores and various clinical factors in TCGA-BLCA, GSE13507, GSE31684, GSE32894, GSE48075, GSE160693, GSE176307 and Xiangya_Cohort datasets. C C-index of TIIC signature scores and 52 BLCA models in TCGA-BLCA, GSE13507, GSE31684, GSE32894, GSE48075, GSE160693, GSE176307 and Xiangya_Cohort datasets

3.4 Biological mechanisms associated with prediction of TIIC signature score

Given the upregulation of immune-related features observed in the low-scoring group, we aimed to delve deeper into potential biological mechanisms. The TIIC signature score exhibited strong positive correlations with most pathways, such as Adherens junction, TGF beta signaling pathway, hippo signaling (Fig. 4A). We selected eight pathways with significant differences between the two groups from the GOBP and KEGG databases and demonstrated their sample tSNE plots along with corresponding ssGSEA scores (Fig. 4B). Enrichment analysis of upregulated genes in the high TIIC group in Metascape revealed associations with inflammatory response, cell–cell adhesion, and other functions (Fig. 4C). GSEA results for predominant genes in both high and low TIIC groups indicated that the high TIIC group was associated with processes such as collagen fibril organization, external encapsulating structure organization, peptide encapsulating structure organization, and tissue development, while the low TIIC group was associated with growth, regulation of hormone levels, secretion, and signal release (Fig. 4D,E).

Fig. 4
figure 4

Biological characteristics of TIIC signature scores in the TCGA dataset. A GSVA analysis based on MsigDB describes the biological characteristics of two TIIC signature score groups. B t-SNE plot of GO and KEGG term depicting differences in pathway activity between two TIIC signature score groups. C Enrichment analysis based on Metascape displays the enrichment of differentially expressed genes between two TIIC signature score groups. D, E GSEA of GO and KEGG terms for TIIC signature scores

3.5 Significant correlation between TIIC signature and immune-related features

We employed the TIMER algorithm for six immune cell types, ssGSEA for 28 immune cell types, MCPcounter algorithm for 10 immune cell types, and Estimation of Stromal and Immune cells in ESTIMATE algorithm to quantify immune infiltrating cells. We observed a decreasing trend in the activity of most immune cells as the TIIC score increased (Fig. 5A). Additionally, we compared the correlation of TIIC signature score with immune regulatory factors, as shown in Fig. 5B.

Fig. 5
figure 5

Immune characteristics of TIIC signature scores in the TCGA dataset. A Correlation between TIIC signature scores and immune infiltrating cells. B Correlation between TIIC signature scores and immune regulatory factors

3.6 Validating the predictive value of TIIC signature score for immunotherapy response across multiple datasets

Given the predictive ability of the TIIC signature score for the benefits of immunotherapy, we proceeded to validate its effectiveness in multiple immunotherapy datasets. In the IMvigor cohort, patients with high TIIC signature scores exhibited better survival outcomes (p = 0.00087, Fig. 6A). However, there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.77, Fig. 6B). In the COAD and PAAD cohorts, the TIIC signature scores in the effective immunotherapy group were significantly higher than those in the progression group (p = 0.011, Fig. 6C). In the GSE35640 melanoma cohort, there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.43, Fig. 6D). In the Braun (RCC cohort), patients with high TIIC signature scores exhibited better survival outcomes (p = 0.038, Fig. 6E). However, there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.8, Fig. 6F). In the GSE91061 melanoma cohort, there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.52, Fig. 6G). In the GSE103668 TNBC cohort, there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.69, Fig. 6H). In the Nathanson melanoma cohort, there was no significant difference in survival outcomes between the high TIIC signature score group and the low TIIC signature score group (p = 0.25, Fig. 6I), and no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.7, Fig. 6J). In the GSE78220 cohort, patients with high TIIC signature scores exhibited better survival outcomes (p = 0.03, Fig. 6K), but there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.93, Fig. 6L). In the GSE126044 NSCLC cohort, there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.44, Fig. 6M). In the GSE165252 ESCA cohort, there was no significant difference in TIIC signature scores between the response and progression groups post-immunotherapy (p = 0.83, Fig. 6N). In the TCGA-BLCA dataset, a higher proportion of non-responders was observed in the high TIIC grouping (p = 6.32e−08, Fig. 6O).

Fig. 6
figure 6

Predicting immunotherapy response in various tumors based on TIIC signature scores. A Kaplan–Meier (KM) survival curves show the association between TIIC signature scores and overall survival (OS) in BLCA (IMvigor). B The association between TIIC signature scores and immunotherapy response in BLCA (IMvigor). C Differences in TIIC signature scores among different immunotherapy response groups in COAD and PAAD (GSE179351). D Differences in TIIC signature scores between responders and non-responders to immunotherapy in melanoma (GSE35640). E The predictive role of TIIC signature scores for OS in the Braun dataset. F Differences in TIIC signature scores between different immunotherapy response groups in the Braun dataset. G Comparison of TIIC signature scores among different immunotherapy response groups in melanoma (GSE91061). H The impact of TIIC signature scores on treatment response in TNBC (GSE103668). I KM survival curves showing the association between TIIC signature scores and OS in melanoma (Nathanson dataset). J Differences in TIIC signature scores between immunotherapy response groups in melanoma (Nathanson dataset). K KM survival curves showing differences in OS between high and low TIIC signature score groups in melanoma (GSE78220). L Distribution of TIIC signature scores among immunotherapy response groups in melanoma (GSE78220). M Distribution of TIIC signature scores among immunotherapy response groups in NSCLC (GSE126044). N The association between TIIC signature scores and immunotherapy response in ESCA (GSE165252). O Differences in the proportion of responders and non-responders in the TCGA-BLCA dataset predicted by the TIDE algorithm based on TIIC signature scores

In summary, the TIIC score exhibits heterogeneity in prognosis and differences in immunotherapy response and progression status. High TIIC scores are more favorable for the prognosis of BLCA and RCC. However, in melanoma, the effect of high TIIC scores on prognosis remains undetermined. In COAD and PAAD, patients with higher TIIC signature scores exhibited a better response to immunotherapy. Conversely, in BLCA, higher TIIC scores were associated with poorer outcomes. For other tumor types, no significant correlation between TIIC scores and immunotherapy efficacy was observed.

3.7 Predicting metabolic features associated with TIIC signature scores

To investigate the extensive metabolic features between the two TIIC signature score groups, we conducted GSVA for metabolic pathways from the KEGG database. TIIC signature scores showed significant correlations with numerous metabolic pathways (Fig. 7A). Notably, Biosynthesis of unsaturated fatty acids and Sphingolipid metabolism exhibited significantly higher activation rates in the high TIIC signature score group, while Glycerophospholipid metabolism and Fatty acid degradation showed significantly higher activation rates in the low TIIC signature score group (p < 0.001, Fig. 7B). Additionally, TIIC signature scores were negatively correlated with pathways such as Fatty acid biosynthesis, Glycerophospholipid metabolism, and Sulfur metabolism, while positively correlated with Sphingolipid metabolism (p < 0.001, Fig. 7C).

Fig. 7
figure 7

Metabolic characteristics of TIIC signature scores in the TCGA dataset. A GSVA analysis based on KEGG depicting metabolic pathways of 11 metabolic categories in two TIIC signature score groups. B Differences in pathways between two TIIC signature score groups. C Correlation between TIIC signature scores and literature-based GSVA metabolic pathway analysis

4 SNV mutation differential analysis and CNV differential analysis

We observed different frequencies of chromosomal alterations between the two TIIC signature score groups (Fig. 8A). The waterfall plot illustrates the mutation status of the top 30 genes in both risk groups, revealing TP53 (49%), TTN (43%), and KMT2D (25.9%) as genes with higher mutation rates (Fig. 8B). The high TIIC signature score group exhibited higher chromosomal instability, characterized by FGA, FGG, and FGL, although no significant changes were observed statistically (Fig. 8C–E). Significant differences were observed in CNV mutations in chr3 between the two groups (Fig. 8F).

Fig. 8
figure 8

Multi-omics changes associated with TIIC signature scores in the TCGA dataset. A Chromosome amplifications and deletions based on GISTIC 2.0 in two TIIC signature score groups. B Genomic alteration landscape in two TIIC signature score groups. C Proportions of genomic alterations, proportions of genomic gains, and proportions of genomic losses in two TIIC signature score groups. D Distribution of changes in Chromosome 3 (Chr3) in two TIIC signature score groups. E The difference of Fraction Genome Lost in TIIC low and high group. F The difference of Chr3 Alteration in TIIC low and high group

5 Discussion

BLCA is a common and highly heterogeneous malignant tumor. According to the World Health Organization (WHO), hundreds of thousands of people are diagnosed with BLCA every year, with some patients missing the optimal treatment window due to late diagnosis, leading to disease progression and deterioration [1]. Despite advances in TURBT or BCG instillation therapy, about 30% of non-muscle-invasive BLCA (NMIBC) patients will eventually progress to muscle-invasive BLCA (MIBC) [3]. Approximately 50% of MIBC patients will experience local or systemic recurrence after radical cystectomy, and some may even succumb to the disease [4]. In recent years, immunotherapy has garnered attention as a novel treatment strategy, particularly PD-1/PD-L1 inhibitors showing promise in the treatment of advanced BLCA [47]. However, many patients still experience immunotherapy failure, making accurate prediction of immunotherapy outcomes in BLCA patients a focus of current research.

Studies have shown that various factors influence the efficacy of immunotherapy, including tumor immunogenicity, TME, immune inhibitory factors, tumor mutation burden, patient immune status, and tumor genotype [48,49,50,51]. Among these factors, the TME plays a significant role in the efficacy of tumor immunotherapy. TIICs, as crucial components of the TME, play roles in immune surveillance, direct killing of tumor cells, regulation of immune responses, and involvement in complex immune regulatory networks [52,53,54]. Therefore, it is necessary to explore the success or failure of tumor immunotherapy from the perspective of TIICs.

In this study, we identified TIICs from BLCA scRNA-seq data and utilized optimized algorithms to filter out immune-related RNAs. BLCA TIIC-RNAs were determined based on calculated TSI scores. Subsequently, we employed various ML algorithms to screen and establish a TIICs signature consisting of 11 TIIC-RNAs (COL6A1, DOCK8, TNFRSF14, MAN2A1, ORMDL1, ADCY7, MAML2, SND1, SLFN11, UBAC2, FAM111A). The higher scores of TIIC signature were associated with poorer OS, and the predictive value of TIIC signature for prognosis was not inferior to clinical parameters such as age, gender, and TNM staging system, but even superior to them. Furthermore, we further confirmed the application value of the TIIC signature at multiple omics levels. Firstly, in terms of biological mechanisms, the TIIC signature showed strong correlations with various biological processes. Secondly, in immune cell scoring, the TIIC signature score directly represented the level of immune infiltration. Additionally, the TIIC signature also demonstrated significant predictive capabilities in metabolic processes. Finally, the TIIC signature accurately depicted the genetic mutation status of patients.

In this study, we employed over 20 machine learning algorithms categorized into four groups: decision trees and random forests (RSF, CForest, CTree, Ranger, ObliqueRSF), linear models (LassoCox, Enet, Ridge Regression, CoxPH, StepwiseCox, PlsRcox), boosting methods (BlackBoost, GBM, CoxBoost, GlmBoost), and other approaches (SurvReg, SurvivalSVM, SuperPC, Akritas, Rpart). Decision trees and random forests excel in handling nonlinear and high-dimensional data but suffer from poor interpretability and a tendency to overfit. Linear models offer high computational efficiency and ease of interpretation but assume linear relationships and are sensitive to outliers. Boosting methods provide high predictive accuracy and robustness against overfitting, albeit with high computational complexity and challenging parameter tuning. Other methods, such as SurvReg and SurvivalSVM, bring diversity and specific advantages but have limited applicability and complexity in understanding and application. By integrating these algorithms, we can leverage their respective strengths to develop more robust and powerful models.

Our constructed TIIC signature score is calculated based on the expression levels of COL6A1, DOCK8, TNFRSF14, MAN2A1, ORMDL1, ADCY7, MAML2, SND1, SLFN11, UBAC2, and FAM111A. It exhibited excellent performance in predicting survival outcomes, indicating potential utility of the TIICs signature score in the future. Existing studies have shown:TNFRSF14 suppresses BLCA proliferation by promoting apoptosis to inhibit tumor cells from escaping immune surveillance [55]. The MECT1-MAML2 fusion protein activates HES1 transcription to disrupt Notch signaling, and activates CREB, thus mimicking constitutive activation of cAMP signaling [56]. SND1, as an endoplasmic reticulum-associated protein, promotes immune escape of tumor cells by directing MHC-I heavy chain to ER-associated degradation pathway, interfering with antigen presentation [57]. FAM111A may influence immune response through the interferon-γ response pathway, associated with macrophages, CD8 T cells, and other immune cells, thereby playing a significant role in tumor immune response [58]. UBAC2 binds to BCRC-3 and subsequently affects the interaction between BCRC-3 and miR-182-5p, leading to inhibition of p27 expression and promoting BLCA proliferation [59]. COL6A1 activates multiple downstream signaling pathways including JAK-STAT, MAPK, NOTCH, p53, VEGF, and WNT signaling pathways, thereby promoting tumor cell proliferation. DOCK8 positively regulates IL-2-induced STAT5 phosphorylation, promoting regulatory T cell suppression [60]. SLFN11, as a key regulator of the DNA damage response, predicts response to various DNA damaging agents and PARP inhibitors in multiple tumors, serving as a crucial predictive biomarker and prognostic marker for various cancer types [61], [62]. Knockout of the Man2a1 gene enhances response to anti-PD-L1 therapy and promotes higher levels of cytotoxic T cell infiltration in tumors under anti-PD-L1 therapy [63]. ADCY7 is closely associated with immune cell infiltration and ICG expression in various tumors [64]. ORMDL1 promotes colorectal cancer cell migration in multiple ways and inhibits invasion and proliferation [65].

However, this study also has some limitations. Firstly, due to the high heterogeneity of tumor tissues, scRNAseq may not accurately define immune cells and BLCA cells, which may result in some inconsistency in the expression patterns of these 11 TIIC-RNAs. Secondly, further validation of the specific biological functions and mechanisms of these 11 TIIC-RNAs in immune cell activity is needed both in vitro and in vivo. Lastly, this study is based on datasets from public databases, making it retrospective in nature. Further validation of the model's accuracy requires multicenter cohort studies to verify the prognostic predictive ability of the model, and more BLCA immunotherapy data are needed to validate the potential of TIIC signature score in predicting immunotherapy response.

Overall, through comprehensive analysis of the transcriptome data of immune cells and LUAD cells, a TIIC signature score with great potential has been established, which can predict the prognosis of BLCA patients and predict the efficacy of immunotherapy. As a newly confirmed predictive biomarker, the TIIC signature score enables more accurate identification of BLCA patients who benefit from immunotherapy, and should be clinically validated and applied as soon as possible.