Introduction

Ischemic stroke (IS) is a frequently diagnosed disease and one of the main causes of death and permanent disability worldwide, exerting a heavy burden on both patients and public health systems [1]. Despite the volume of research on this devastating disease that has been performed around the world, no breakthroughs have been translated into therapeutic use [2]. The difficulties in neural regeneration and functional recovery after IS may be attributed to its complex pathological alterations. Therefore, elucidating the underlying pathological mechanisms of IS may optimize treatment strategies, thus improving patient prognosis [3].

Accumulating evidence has suggested that unbalanced immunity and inflammation play critical roles in the progression of IS [4]. Because the brain heavily relies on the continuous supply of oxygen and the delivery of glucose through the bloodstream, a sudden interruption in the cerebral blood supply and subsequent reperfusion can lead to irreversible brain damage and quickly trigger sterile inflammation [5, 6]. During the development of IS, various adaptive and innate immune cells can more easily access ischemic brain regions from the peripheral circulation because of disruption of the blood–brain barrier, further initiating a pro-inflammatory cascade that results in additional and extensive neurological injury. In recent years, some studies have revealed that neurological damage can be limited and that functional recovery can be promoted by controlling the immune-inflammatory response in the central nervous system (CNS) microenvironment [7]. However, reliable biomarkers for IS remain uncertain. Therefore, it is necessary to further investigate the genetic alteration patterns of ischemic regions to identify more promising targets.

In recent years, high-throughput technologies have become a powerful way to explore molecular mechanisms and identify the key genes involved in IS [7, 8]. However, it is challenging to obtain ischemic brain tissue from patients for RNA sequencing or microarray analysis because of certain ethical issues. As an alternative, brain tissue can be collected from model organisms under optimal conditions. The middle cerebral artery occlusion (MCAO) models in animals have been developed to mimic human stroke injury. Rats are commonly exploited in the field of stroke research due to their cerebral vasculature and physiology similarities to human beings [9,10,11,12]. Thus far, large quantities of microarray datasets from rats with MCAO are available in public online data repositories [9, 10, 13]. However, existing single datasets normally include a limited number of samples. Integrating multiple datasets from different experiments not only increases the sample size, but also improves the robustness of the results, which is of great significance to explore the molecular changes after IS in depth [7, 14, 15].

For researchers, screening out the molecular targets that play vital roles in IS progression from the vast amount of high-throughput data is undoubtedly a challenge [16]. As a valuable tool in systems biology, weighted gene co-expression network analysis (WGCNA) has gradually been applied to generate co-expressed gene modules linked to specific disease mechanisms from massive amounts of gene expression data [17]. However, least absolute shrinkage and selection operator (LASSO) regression is a broadly applied machine learning approach that is suitable for advancing the predictive value and accuracy of key genes discovered based on high-throughput data [18]. Therefore, the integration of WGCNA and LASSO regression can effectively screen for important genes associated with IS.

MicroRNAs (miRNAs) are a class of small noncoding RNAs that negatively regulate gene messenger RNA (mRNA) expression at the posttranscriptional level through base pairing with the 3′UTR of target mRNAs [19]. As widely reported, the dysregulation of the expression of numerous miRNAs is related to pathophysiological processes after IS [20]. Therefore, elucidating miRNA expression profiles and constructing miRNA‒mRNA networks are important for discovering molecular markers and improving the understanding of the mechanisms underlying IS.

Therefore, the objective of this study was to identify potential biomarkers associated with IS that exhibit coordinated expression in both the brain and peripheral blood. To achieve this, four mRNA datasets and one miRNA dataset from MCAO rats were collected from the Gene Expression Omnibus (GEO) database. Integrated bioinformatics approaches were utilized to identify differentially expressed genes (DEGs) and miRNAs (DEMs), and a regulatory miRNA‒mRNA network was constructed. Furthermore, WGCNA and the LASSO algorithm were combined to identify characteristic IS-related genes. Finally, the diagnostic efficacy and expression level of these candidate genes were verified by the original training set, three external datasets, rat models, and clinical samples.

Results

Identification of DEGs and DEMs

The expression matrix of each dataset was obtained after preprocessing, including background correction and normalization. The training sets included the expression profiles of four mRNAs (GSE36010, GSE78731, GSE61616, and GSE106931) and one miRNA (GSE46266). Then, the four mRNA datasets were combined after batch effect removal. The combined expression matrix of 7684 genes was obtained from 46 samples (21 MCAO samples versus 25 control samples). Through differential expression analysis, a total of 780 DEGs were obtained. Moreover, 56 DEMs were identified from GSE46266. The overall distribution of DEGs and DEMs has been visualized using volcano plots (Fig. 1a and 1b).

Fig. 1
figure 1

Identification of differentially expressed mRNAs (DEGs) and miRNAs (DEMs) in the brains of middle cerebral artery occlusion (MCAO) model rats. a Volcano plot showing DEGs between the MCAO group and the control group. b Volcano plot of DEMs in the GSE46266 dataset. In a and b, red plots indicate upregulation, blue plots represent downregulation, and gray plots indicate nondifferential expression between the two groups

Gene Set Enrichment Analysis (GSEA)

GSEA was conducted to obtain a preliminary understanding of the main roles of all 7684 genes in the training set. The results showed that the top five biological process (BP) terms enriched in these genes included leukocyte proliferation, B-cell activation, symbiotic interaction, synaptic plasticity, positive regulation of locomotion, and regulation of synaptic plasticity (Fig. 2a). In addition, the top five Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were involved in the cell cycle, hematopoietic cell lineage, systemic lupus erythematosus, cytokine‒cytokine receptor interaction, and primary immunodeficiency (Fig. 2b). The top five items are shown in the ridgeline plots (Fig. 2c and 2d). These items were associated with immune-inflammation responses and neural plasticity. The GSEA results with a cutoff value of adjusted P < 0.05 and a value of |normalized enrichment score (NES)|> 1 are presented in Supplementary Table S1.

Fig. 2
figure 2

The results of the gene set enrichment analysis (GSEA). a Three representative biological processes (BPs). b Three representative Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. c The ridgeline plot shows the enrichment results of the top five gene sets of BPs. d The ridgeline plot shows the enrichment results of the top five gene sets of KEGG pathways

Identifying WGCNA Module Genes Associated with IS

WGCNA was used to identify gene clusters strongly related to IS in the combined dataset. We found that when the soft threshold was set at 22 after calculation, the scale independence of the topology network reached 0.85, and the adjacency matrix obtained a comparatively high value for the average connectivity (Fig. 3a). Furthermore, five co-expressed gene modules were identified using a hierarchical tree algorithm (Fig. 3b). After relating the modules to disease state traits by correlation analysis, we found that the turquoise module had the highest absolute correlation with the IS (Fig. 3c). Figure 3d shows the correlation analysis between module membership and gene significance in the turquoise module and the correlation coefficient between the 2157 genes in the turquoise module; this figure also shows that the importance of IS genes was 0.69 (P < 1e − 200).

Fig. 3
figure 3

Weighted gene co-expression network analysis. a Analysis of network topology for various soft-threshold powers. b Hierarchical tree algorithm of genes. The gene hierarchical clustering tree displays the network and the five identified modules. c Heatmap of module-feature associations. d Scatter plot between module members and IS gene salience in the turquoise module

Exploration for the Target Gene of DEMs

In total, 10,849 target mRNAs were predicted by the miRWalk database to have binding sites for 14 of the 56 DEMs mentioned above. Among the 14 DEMs, eight were upregulated, and six were downregulated. We intersected these 10,849 genes with the 780 DEGs and 2157 turquoise module genes, retaining 354 overlapping genes as IS-related key genes for further analysis (Fig. 4a). Subsequently, these 354 intersecting genes were used to construct miRNA‒mRNA subnetworks (Fig. 4b and 4c; Supplementary Table S2).

Fig. 4
figure 4

Exploration of the key ischemic stroke-related genes. a Venn diagram of the intersection of differentially expressed mRNAs (DEGs), genes of the turquoise module, and target mRNAs of differentially expressed miRNAs. b miRNA‒mRNA subnetworks for upregulated DEMs. c miRNA‒mRNA subnetworks for downregulated DEMs

Functional Enrichment Analysis

A functional enrichment analysis of the 354 intersecting genes was carried out. The BP terms included leukocyte-mediated immunity, positive regulation of leukocyte activation, adaptive immune response, leukocyte migration, and leukocyte activation involved in an unbalanced immune response. In terms of cellular components (CCs), there was predominant enrichment in the external side of the plasma membrane, membrane rafts, membrane microdomains, extracellular matrix, external encapsulating structure, and so on. In relation to molecular function (MF) terms, the genes were markedly enriched in cell adhesion molecule binding, cytokine receptor binding, kinase regulator activity, integrin binding, immune receptor activity, and so on (Supplementary Table S3). The top 10 items are shown in the star bar plots (Fig. 5a–5c). The enriched KEGG terms included viral protein interaction with cytokine and cytokine receptor, NF-κB signaling pathway, lipids and atherosclerosis, and cytokine‒cytokine receptor interaction (Supplementary Table S4). The top 10 KEGG pathways are shown in bubble plots (Fig. 5d). The majority of the results of the functional enrichment analysis were associated with immune and inflammatory responses.

Fig. 5
figure 5

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses. a–c Top 10 terms of GO functional enrichment analysis results, including biological processes, cellular components, and molecular functions. d Top 10 results for KEGG pathway analysis

Screening and Diagnostic Performance of IS Biomarkers

To identify IS-related characteristic genes, LASSO regression analysis identified six genes, namely, Tgfb1, Fc gamma receptor 2 b (Fcgr2b), Vcan versican (Vcan), leukocyte immunoglobulin-like receptor B4 (Lilrb4), cholinergic receptor, muscarinic 1 (Chrm1), and chymotrypsin-like elastase 1 (Cela1), from the 354 intersection genes for subsequent research (Fig. 6a and b). The receiver operating characteristic (ROC) curve results showed that the area under the ROC curve (AUC) values of the six genes were greater than 0.9 in the training set (Fig. 7). After that, ROC curve analysis was applied to assess the diagnostic value of these six genes in the GSE97537, GSE97533, and GSE66724 validation sets (Fig. 8a–c). Finally, only one mRNA, Tgfb1, had an AUC greater than 0.7 in all validation sets. These results indicate that Tgfb1 has good diagnostic performance in distinguishing between the IS group and control group, both in rats and participants.

Fig. 6
figure 6

Identification of biomarkers for ischemic stroke by least absolute shrinkage and selection operator (LASSO) regression. a LASSO regression analysis identified six characteristic genes associated with ischemic stroke with minimum lambda values and nonzero parameters. b The LASSO coefficient profile plot was produced against the log(lambda) sequence in the metadata cohort. Each curve corresponds to the trajectory of each gene

Fig. 7
figure 7

Receiver operating characteristic (ROC) curve analysis of characteristic ischemic stroke-related genes in the training sets GSE36010 combined with GSE78731, GSE61616, and GSE106931. ROC curve analysis of a Tgfb1, b Fcgr2b, c Vcan, d Lilrb4, e Chrm1, and f Cela1

Fig. 8
figure 8

Validation of characteristic ischemic stroke-related genes in the validation sets. Receiver operating characteristic curve analysis of these six genes in a GSE97537, b GSE97533, and c GSE66724

Validation in MCAO Rat Model

The modified neurological severity scores (mNSSs) were similar between the two groups before the operation (P > 0.05). At 1 day and 3 days after the operation, the mNSSs were greater in the MCAO group than in the control group (P < 0.05) (Fig. 9). To verify the expression level of Tgfb1 mRNA in the MCAO rat model, we collected brain and blood tissue 3 days after reperfusion. Hematoxylin and eosin (H&E) staining and Nissl staining confirmed the successful construction of the model (Fig. 10a and b). The RNA extraction and quantitative real-time polymerase chain reaction (qRT‒PCR) results revealed a significant increase in Tgfb1 mRNA in both the brain and blood tissue in the MCAO group compared with that in the control group (P < 0.05 and P < 0.01, respectively) (Fig. 11a–11b).

Fig. 9
figure 9

Modified neurological severity scores (mNSSs) at various times following the operation, where a higher score indicates more severe neurological dysfunction. nsP > 0.05, compared with the control group; **P < 0.01, compared with the control group

Fig. 10
figure 10

Representative H&E- and Nissl-stained micrographs of the brain (scale bars = 1000 or 100 μm). a Representative H&E-stained transverse sections. b Nissl bodies in neurons in the brain tissue of the two groups were observed by Nissl staining. Blue staining represents a Nissl body, where the darker the color of the Nissl body or the tabby shape, the better the neuronal state. Hematoxylin and eosin (H&E). **P < 0.05, compared with the control group

Fig. 11
figure 11

Animal experimental validation of ischemic stroke-related biomarkers. a Expression level of Tgfb1 mRNA in the brains of rats. b Relative mRNA expression of Tgfb1 in the blood of rats in the middle cerebral artery occlusion (MCAO) group versus the control group. *P < 0.05. **P < 0.01, compared with the control group

Verification in Clinical Cohort

Next, we performed qRT‒PCR to validate the expression of TGF-β1 mRNA in peripheral blood samples obtained from both IS patients and healthy controls (HCs). The demographic features of the participants are summarized in Table 1. These IS patients and HCs did not significantly differ in age or sex (P > 0.05). Consistent with the trend of bioinformatics analysis, qRT‒PCR revealed that TGF-β1 mRNA expression in the blood of the IS patients was greater than that in the blood of the HCs (P < 0.001) (Fig. 12a). The AUC of TGF-β1 in the clinical samples was 0.739 (95% confidence interval, 0.590–0.888) (Fig. 12b).

Table 1 Demographic features of the participates
Fig. 12
figure 12

Clinical experimental validation of ischemic stroke (IS)–related biomarkers. a Relative mRNA expression of TGF-β1 in IS patients versus healthy controls. **P < 0.01. b Receiver operating characteristic curve analysis of TGF-β1 expression

Correction Between TGF-β1 Level and Functional Outcomes

The functional outcomes of the IS patients were quantified using the modified Rankin scale (mRS) score. The mRS score separated the IS patients into groups with good (mRS ≤ 2) and poor functional outcome (mRS > 2). The characteristics of patients are summarized in Table 2. The level of TGF-β1 in patients with good functional outcomes was significantly greater than that in patients with poor functional outcomes (P < 0.05) (Fig. 13a). There was a negative correlation between the TGF-β1 level and the mRS score (r =  − 0.442, P < 0.05) (Fig. 13b).

Table 2 Comparison of clinical characteristics of patients with good and poor outcome
Fig. 13
figure 13

The relationship between the TGF-β1 level and modified Rankin scale (mRS) score. a Blood TGF-β1 levels in patients with good functional outcomes and poor functional outcomes. *P < 0.05. b The correlation between the TGF-β1 level and the mRS score was analyzed using Spearman’s linear correlation

Discussion

Despite substantial advances in the treatment and rehabilitation of IS in recent decades, a considerable number of survivors remain severely disabled [21, 22]. Dysregulation of the immune-inflammatory response is one of the barriers to IS recovery; this phenomenon has been supported by many studies. Therefore, identifying immune-inflammation–related pathways and providing suggestions for further studies will greatly increase the chances for IS patients to receive effective rehabilitation. In the present study, six genes (Tgfb1, Fcgr2b, Vcan, Lilrb4, Chrm1, and Cela1) were revealed to be significantly correlated with IS by WGCNA and LASSO logistic regression. Finally, only one candidate gene, the Tgfb1 gene, was identified as a potential target by ROC curve analysis in the external validation sets, and this finding was statistically supported in samples from both the MCAO models and clinical cohorts.

IS is defined as the sudden interruption of arterial blood flow into the brain parenchyma and is accompanied by a loss of integrity of the blood–brain barrier [5]. In response to ischemia‒reperfusion injury, innate and adaptive immune cells are quickly recruited to ischemic brain tissue, where they produce pro-inflammatory cytokines, chemokines, and reactive oxygen species, leading to amplification of the local inflammatory response. This early activation of the immune system leads to secondary injury after IS [23]. In our study, the GSEA results implied that the genes in the combined set were primarily enriched in BPs, including the regulation of synaptic plasticity, leukocyte proliferation, and B-cell activation. Leukocytes, also known as white blood cells, are markers of inflammation and have been confirmed to be closely related to the occurrence of cerebrovascular diseases [24]. In the pathophysiological state, leukocytes can be recruited and adhere to the arterial vessel wall, thus initiating and mediating the progression of atherosclerosis. Elevated levels of leukocytes may contribute to IS either through an effect on chronic atherosclerosis or by inducing acute thrombosis (possibly by increasing the likelihood of plaque rupture) [25]. Under physiological conditions, the number of detectable B cells is very low in the brain parenchyma, but B cells are frequently trafficked to CNS tissues in response to ischemic injury following a stroke. The function of B cells in the context of stroke has not been thoroughly elucidated, although it has been proposed that B-cell–mediated responses can interfere with neuronal function and could contribute to the development of poststroke cognitive impairment [26, 27]. In addition, neuroinflammation affects the ability of neurons to exhibit synaptic plasticity. These excessive inflammatory responses can be detrimental and may lead to hindered neural repair and expanded infarct volume [28,29,30]. In the present study, we performed KEGG enrichment analyses of the 354 IS-related key genes related to NF-κB signaling pathways, lipids and atherosclerosis, cytokine–cytokine receptor interactions, and so on. Some of these pathways have been well studied in the field of IS and have been shown to be associated with immune-inflammatory responses. For example, the transcription factor NF-κB is a core part of the regulation of inflammatory responses post-IS [31]. The NF-κB signaling pathway can be triggered following IS, causing the expression of target genes associated with the inflammatory response and upregulating the level of cytokines necessary for inflammation [32]. Li et al. showed that inhibiting the entry of the NF-kappa B signaling pathway into the ischemic brain can alleviate neuroinflammation-induced damage and attenuate neurologic function deficits in MCAO rats [33]; numerous studies have shown that the inhibition of NF-κB activation can be considered a critical target and an effective way to curb the immune-inflammatory response after IS [34]. The understanding of atherosclerosis has been redefined by the concept of an ongoing inflammatory response. Inflammation not only drives the formation and development of atherosclerotic plaques, but also, importantly, plays a central role in the destabilization of artery plaques, including erosion and fissuring, thus transforming chronic atherosclerosis into an acute thromboembolic disease [35]. Therefore, approaches that attenuate the inflammatory response may be effective therapeutic strategies to combat the development of atherosclerosis [36]. Typically, following IS, multiple cytokines are released from damaged cerebral tissue. Cytokines, such as tumor necrosis factor, interleukin-1, and interleukin-6, have been proven to modulate infarct evolution in experimental stroke (focal cerebral ischemia) rodent models. Hence, therapeutic strategies related to cytokines, such as the administration of cytokine antibodies or cytokine receptors, may be promising adjuvants for IS management [37, 38].

As a class of important pervasive genes, a substantial body of evidence has indicated that miRNAs are key players in the regulation of the IS-induced immune-inflammatory response [17]. MiRNAs are noncoding 18–25-nucleotide-long RNA molecules that can negatively regulate the expression of specific target genes at the posttranscriptional level [39]. In the current study, a total of 14 miRNAs were identified from the constructed miRNA‒mRNA regulatory networks. Of these miRNAs, eight were upregulated, and six were downregulated. Several of these miRNAs have been confirmed to participate in the pathophysiological processes of IS in previous studies. For example, Zhang et al. found that the expression of miR-324-3p was downregulated both in patients with IS and in in vitro and in vivo models; they also found that silencing miR-324-3p can decrease the inflammatory response to protect against cerebral ischemic damage in MCAO rat models [40]. Mehta recently reported that the level of miR-7b was significantly reduced in the brain following transient focal ischemia in rodent models and that miR-7 supplementation promoted poststroke recovery by targeting α-Syn [41]. MiR-342-5p was reported to prevent neuronal apoptosis and attenuate cerebral ischemia–induced injuries in MCAO-treated mice via the inhibition of the Akt/NF-κB signaling pathway [42]. Additionally, several miRNA‒mRNA pairs in regulatory networks, such as miR-324-5p/TGFBR1 and miR-330-5p/CD44, have been confirmed through dual-luciferase reporter assays in previous studies [43,44,45]. Therefore, more experimental studies are warranted in the future to verify the biological functions of these miRNAs in IS and to clarify miRNA‒mRNA interactions. Therefore, more experimental studies are warranted in the future to verify the biological functions of these miRNAs in IS and to clarify miRNA‒mRNA interactions.

Microglia, also called brain macrophages, are the primary immune cells in the CNS and are essential for preserving brain homeostasis. These immune cells can be activated within minutes of a damaging event following IS, reach peak amounts of activity 2–3 days after the injury, and linger for weeks [46, 47]. The relationship between microglia and the inflammatory response to stroke is complicated. Intralesional microglia and newly recruited microglia assume an anti-inflammatory phenotype in the initial stages of ischemia but gradually transform into a pro-inflammatory phenotype in areas next to the lesions [48]. It is important to note that, although the M1/M2 paradigm has been criticized as an oversimplification of neuroinflammation, it continues to be the model that is used most to explain the function of microglia [28]. M1-type microglia secrete pro-inflammatory cytokines, whereas M2-type microglia alter the cellular microenvironment of the ischemic brain by clearing cell debris and releasing an anti-inflammatory cytokine panel, including Tgfb1 [28, 49]. Tgfb1 belongs to the Tgfb superfamily, which consists of several highly conserved multifunctional cell–cell signaling proteins that are important regulators of cell survival, immunomodulation, and post-injury repair and that play important roles in controlling tissue homeostasis [48, 50, 51]. In the current study, Tgfb1 was identified as a promising biomarker for IS. A significant increase in the expression and release of Tgfb1 has been observed in response to CNS lesions. Such elevated levels of Tgfb1 under early stroke conditions may, in turn, induce anti-inflammatory signaling cascades and drive microglia toward an anti-inflammatory M2 polarization shift from the pro-inflammatory M1 phenotype within the ischemic brain [51]. Xin et al. reported that inhibition of the TGF-β1 signaling pathway resulted in an augmented inflammatory response, while the administration of exogenous TGF-β1 protein increased the expression level of anti-inflammatory cytokines and reduced the release of pro-inflammatory cytokines in in vitro experimental models of IS [48]. At the signaling level, TGF-β1 protein induced the persistent anti-inflammatory effects on microglia, and this event may also be dependent on the activation of the canonical small mother against decapentaplegic-dependent signaling pathway, the prevention of the phosphorylation of IκB kinase, and the suppression of the NF-κB-dependent pathway [52]. Li et al. found that the crosstalk exists between TGFβ1 and NF-κB signaling pathway in multiple contexts [53]. A study by Gao et al. also showed evidence that the suppression on pro-inflammatory cytokines/chemokines and critical activators of inflammation exerted by TGF-β1 is partially mediated by the blockade on NF-κB pathways [54]. In this study, the NF-κB signaling pathway was enriched by KEGG analysis, implicating this pathway as a potential downstream mechanism for the anti-inflammatory effects of TGF-β1. TGF-β1 was also suggested to be anti-atherogenic by decreasing inflammation [55]. Inhibition of TGF-β1 signaling has been shown to accelerate plaque formation and its progression toward an unstable phenotype. Our study found that the lipid and atherosclerotic pathways were also significantly enriched by KEGG analysis. Therefore, lipids and atherosclerosis pathways may be a critical molecular pathway for TGF-β1 to participate in the pathological processes after IS. In another study, Sugimoto et al. reported that Tgfb1 expression can be highly induced in the ischemic brain of a rat stroke model [56]. Krupinski et al. showed that infarcted penumbra tissue has greater levels of TGF-β1 protein and TGF-β1 mRNA in human specimens than normal contralateral hemisphere tissue [57]. Zhang et al. also demonstrated that elevated TGF-β1 levels correlate with the extent of tissue injury in MCAO models [51]. In a prospective cohort study, Taylor observed that the concentrations of TGF-β1 in the peripheral blood of patients with strokes were associated with long-term outcomes [46]. As expected, a similar trend in TGF-β1 levels was observed in the MCAO model and the clinical patients in the current study. We also noted that patients with higher levels of TGF-β1 in peripheral blood may have better outcomes in the recovery phase after IS.

In addition to its anti-inflammatory effects, the Tgfb1 protein has been demonstrated to exert neurotrophic and neuroprotective effects [58]. Wen et al. presented data suggesting that peripheral Tgfb1 protein administration can reduce damage to the blood‒brain barrier and decrease the infiltration of neutrophils and macrophages, contributing to a reduction in neuronal damage and increasing neurological recovery [59]. In addition, several in vitro studies have linked Tgfb1 treatment to the promotion of axonal growth in several subtypes of neurons and have causally implicated Tgfb1 in the rehabilitation after strokes [46]. It has also been reported that the TGF-β1 protein may reactivate chronically denervated Schwann cells in vivo and could be used to prolong regenerative responses to promote axonal regeneration [60]. There is also evidence that the TGF-β1 protein is required for the full neuroprotective activity of nerve growth factor (NGF), brain-derived neurotrophic factor (BDNF), and glial-derived neurotrophic factor (GDNF). Therefore, TGF-β1 might also promote neuronal survival and neurogenesis by acting synergistically with these neurotrophins [50]. In our study, results using the GSEA method supported this interpretation because the BP term of synaptic plasticity was significantly enriched. Therefore, synaptic plasticity may be one of the specific downstream mechanisms by which TGF-β1 exerts neuroprotective effects, and it is worth further investigation through experiments. Furthermore, a component of the neuroprotective action of Tgfb1 is mediated by inhibiting the activation of apoptotic proteins such as caspase-3, increasing the expression of antiapoptotic proteins such as Bcl-2 and Bcl-xl, and maintaining mitochondrial membrane potential [61, 62]. Blocking TGF-β1 signaling has been shown to increase the extent of infarction and deteriorate the homeostatic function of the CNS following brain ischemia in animal stroke models, whereas delivery of TGF-β1 via intranasal administration has been shown to enhance neurogenesis and improve functional outcomes [63], whereas delivery of TGF-β1 via intranasal administration has been shown to enhance neurogenesis and improve functional outcomes [56]. Wang et al. indicated that electroacupuncture at Baihui and Shuigou acupoints can further increase the level of Tgfb1 in MCAO rats to alleviate cerebral edema, attenuate infarct volume, and promote the recovery of neurological function. Hence, TGF-β1 may be a feasible and attractive molecular target for the management of IS [64].

Despite these promising findings, the current study has several limitations. First, the data analyzed were obtained from online databases and experimental rat models. Because of disease characteristics, we could not collect brain tissue from IS patients to validate our findings. Second, we integrated four datasets from different sequencing platforms, which could have biased our results. The results were also influenced by the timing, occlusion process (permanent or transient MCAO model), and severity of the animal models. Third, although we screened one candidate hub gene, its specific function and underling mechanism have not yet been elucidated in depth. Fourth, dual-luciferase gene reporter assays and other experiments should be used to further examine miRNA‒mRNA interactions in the regulatory network. Fifth, although we performed external validation, the sample size of the validation group was relatively small. Finally, longitudinal comparisons were needed to assess the dynamics of Tgfb1 expression change over time. Therefore, bioinformatics analyses demand further evaluation in larger prospective multicenter cohort studies, and there is also an urgent need for several in vitro and in vivo functional assays to support our findings.

Conclusions

The hub gene, Tgfb1, identified in this study exhibited stable expression patterns in both MCAO rat models and clinical IS patients. Moreover, accumulating evidence has highlighted the close association between this hub gene and IS. Notably, Tgfb1 demonstrated promising diagnostic efficacy for IS and the relationship with disease servility. These findings underscore the potential utility of Tgfb1 as a promising biomarker for IS diagnosis.

Materials and Methods

Data Acquisition

The mRNA and miRNA expression profile data of the IS in the original training and the external validation sets were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo) into R software (version 4.2.3) using the GEOquery package (version 2.64.2) [65]. The training sets included the expression profiles of four mRNAs (GSE36010, GSE78731, GSE61616, and GSE106931) and one miRNA (GSE46266). The sequencing information in the training sets came from the brain tissue of MCAO rats. The GSE97537, GSE97533, and GSE66724 datasets were used for validation. GSE97537 and GSE97533 were sourced from the brain tissue and blood samples of MCAO rats, respectively. The GSE66724 dataset contained 16 blood samples from eight IS patients and eight HCs. More detailed information about the eight datasets is presented in Supplementary Table S5.

Data Preprocessing and Study Design

Raw data from each set were converted into an expression matrix and then processed with a standard workflow, including background correction and normalization, here by using the limma package (version 3.52.4) [66]. Afterward, the four training sets, including 21 MCAO model samples and 25 control samples, were merged and adjusted for batch effects using the ComBat function from the sva package (version 3.44.0) in R [67]. The specific flow diagram is shown in Fig. 14.

Fig. 14
figure 14

Flow chart of the analysis

Analysis of DEGs and DEMs

The limma package was used to explore DEGs and DEMs between the MCAO group and the control group, with a selection value of |log fold change (FC)|> 0.58 and a significance threshold of P < 0.05. Thereafter, volcano plots were generated based on the ggplot2 package (version 3.4.2) to visualize the results [60].

GSEA

The clusterProfiler package (version 4.8.3) [68] was used to perform GSEA on all gene expression matrices in the training set to find significantly enriched gene sets and their change trends. The enrichment was considered significant, with a value of |NES|> 1 and adjusted P < 0.05. The enrichment results, here arranged according to the NES, were visualized by ridgeline plots and GSEA enrichment plots using the ggplot2 package.

Construction of WGCNA and Identification of Modules Significantly Associated with IS

For the construction of the gene co-expression matrix, the Pearson correlation coefficients between gene pairs were calculated via the WGCNA package (version 1.72–1) [69]. The top 25% of genes with high expression variances were selected for analysis by the WGCNA method. Then, soft power = 22 and R2 = 0.85 were selected as the thresholding parameters based on the principle of a scale-free network to construct a biological network, and the resulting adjacency matrix was further transformed into a topological overlap matrix before being assigned to gene modules by hierarchical clustering analysis. The gene modules were defined as branches of the hierarchical clustering tree, and each module was assigned a color. Finally, the correlation between each module and the characteristics of the samples was measured by gene significance values and module membership values to identify key modules for subsequent research.

Predictions of miRNA‒mRNA Interactions

The target genes of the DEMs were predicted using miRWalk 2.0 (http://mirwalk.umm.uniheidelberg.de/) [70]. The predicted DEM-target genes overlapped with all DEGs as well as the genes in the WGCNA key module to obtain the intersecting genes. The intersection genes were used as the IS-related key genes in the current study for further analysis. The miRNA‒mRNA interactions based on the DEMs and IS-related key genes were visualized using Cytoscape (version 3.9.0) [71].

Functional Enrichment Analysis

Gene ontology (GO) analysis and KEGG pathway analysis were carried out to determine the biological functions and physiological functions of the above IS-related key genes using the clusterProfiler package. The GO terms included BP, cellular component, and MF. The significantly enriched results were filtered with P < 0.05 and visualized using the ggplot2 package.

Screening Potential Biomarkers Based on LASSO Regression

LASSO is a machine learning algorithm for regression that introduces a penalty coefficient λ to find the optimal model for reducing data dimensions. LASSO regression was performed using the glmnet package (version 4.7–1) in R to pick out the characteristic genes from the IS-related genes mentioned above, with tenfold cross-validation to select the optimal regularization parameter λ [72]. The parameters were set as follows: family = “binomial,” alpha = “1,” and “lambda. min” was chosen as the optimal lambda. Next, the prediction performance of these genes in the training and validation sets was measured by ROC curve analysis and the AUC. If a gene had an AUC greater than 0.7 in all validation sets, it was retained as an IS biomarker.

Establishment of the Rat MCAO Model

Rat is one of the most commonly used laboratory animal species in studies of IS. To verify the authenticity of the IS biomarkers, an MCAO model was induced in specific pathogen-free (SPF)-grade male Sprague–Dawley rats (6–8 weeks old, approximately 250–280 g per rat) purchased from Guangdong Weitong Lihua Experimental Technology Co., Ltd. (license no. SCXK [Yue] 2022–0063) and maintained in a constant environment (22–24 °C, 12/12-h light and dark cycle and 50–70% humidity) with food and water ad libitum. All animal care and experimental procedures were approved by the Guangxi Medical University Animal Research Ethics Committee (approval no. 202210004). All rats were allowed to adapt for 1 week prior to the operation.

A total of 16 rats were divided into the following two groups of eight rats each: the control group and the MCAO group. The rats in the MCAO group were induced by transient MCAO surgery under pentobarbital anesthesia, as previously described [73]. A silicone-coated nylon monofilament (diameter, 0.35–0.37 mm; Jialing Biotech Co. Ltd., Guangzhou, China) was inserted into the right middle cerebral artery from the external carotid artery and left in situ for 90 min. The rats were kept on a heating pad at 37 °C during the entire surgical procedure. The rats in the control group underwent the same operation process without filament insertion. After sufficient anesthesia, the rats were sent to clean cages. Rats with neurological deficit scores of 1 to 3 points according to a Longa 5-point scale at 24 h after reperfusion were considered effective in establishing an IS model and were included in the study [74]. The rats were sacrificed 3 days after reperfusion. Blood samples from circulation and brain samples from the ischemic hemisphere of the rats were collected for further experiments.

Neurological Deficit Evaluation

The mNSS test was performed to evaluate the neurobehavioral function of the rats, as described in a previous study [75]. All rats were subjected to this assessment before injury at 1 day and 3 days following the operation by two blinded investigators. The mNSS is a composite of four tests (motor, sensory, reflex, and beam balance tests). Function was scored on a scale of 0 to 18, where 0 represents normal and 18 represents maximum neurological dysfunction. Each rat was tested three times, and the average score was recorded.

H&E and Nissl Staining

After being fixed in 4% paraformaldehyde for 24 h at 4 °C, the brain samples were dehydrated and embedded in paraffin blocks. Then, serial 4-μm-thick cross-sections were made on slides and mounted with poly-L-lysine coating. The specimens were sequentially deparaffinized and rehydrated and then subjected to morphological examination via H&E and Nissl staining. The images were captured using an optical microscope (BX53, Olympus, Tokyo, Japan).

Study Population

Blood samples were collected from 10 patients with IS and 10 HCs at the Department of Rehabilitation Medicine of the First Affiliated Hospital of Guangxi Medical University between January 2023 and May 2023. The present study was approved by the ethical committee of the First Affiliated Hospital of Guangxi Medical University (approval no. 2022-K138-01) and identified in the Chinese Clinical Trial Registry (registration no. ChiCTR2200067102) on December 27, 2022. All IS patients underwent detailed and rigorous neurological examinations and brain magnetic resonance imaging (MRI) scans. The inclusion criteria were as follows: first-time IS, anterior circulation infarction, and large artery atherosclerosis confirmed by magnetic resonance angiography (MRA); disease duration ranging from 1 to 3 months; stable vital signs; and consciousness; and aged 18 to 75 years. The exclusion criteria were as follows: prior history of cerebral hemorrhage, severe head trauma or other neurological disease, rheumatic heart disease, heart failure, renal dysfunction, severe liver disease, or malignancy. The National Institutes of Health Stroke Scale (NIHSS) score and modified Rankin scale (mRS) score for each IS patient were collected upon admission. Each HC was age- and sex-matched to the corresponding IS patient. In addition, another 22 patients were later recruited between June 2023 and March 2024 using the same diagnostic and recruitment criteria to explore the correlation between the levels of biomarkers and IS severity evaluated by mRS. All participants provided written informed consent. Samples of approximately 4 mL of peripheral blood were taken from each participant via ethylenediaminetetraacetic acid anticoagulant tubes (Sanli, Liuyang, China) between 7 a.m. and 9 a.m. of the day after admission under 10 h of fasting. All procedures were conducted in accordance with the Declaration of Helsinki. All participants provided written informed consent.

qRT-PCR

Total RNA from brain and blood samples was extracted using NucleoZol RNA reagent (Macherey–Nagel, Duren, Germany) and transcribed into cDNA according to the instructions of the TranScript All-in-One First-Strand cDNA Synthesis SuperMix for qPCR Kit (TransGen, Beijing, China). The qRT‒PCRs were conducted with PerfectStart® Green qPCR SuperMix (TransGen, Beijing, China) on the StepOne Real-Time PCR System (Applied Biosystems, Waltham, MA, USA). Furthermore, the expression of the housekeeping genes GAPDH/Gapdh was used as an internal control. Relative gene expression was quantified using the 2−ΔΔCt method. The experiment was repeated independently five times. The sequences of primers used in the current study were designed and synthesized by Sangon Biotech (Shanghai, China) and are listed in Table 3.

Table 3 Primer sequence for quantitative real-time polymerase chain reaction

Statistical Analysis

Statistical analysis and image construction were performed using GraphPad Prism software, version 8 (GraphPad Software, La Jolla, CA, USA). Normally distributed continuous data are expressed as the mean ± standard deviation (SD) and were compared using Student’s t test. Nonnormally distributed variables are represented by medians and quartile ranges and were evaluated with the Wilcoxon–Mann–Whitney test. The correlation between the TGF-β1 level and the mRS score was analyzed using Spearman linear regression. P < 0.05 indicated statistical significance.