Introduction

Prostate cancer (PCa) is a highly heterogeneous disease, encompassing various molecular and clinical pathological subtypes1. In PCa, approximately 400 fusion genes have been identified, playing a promoting role in the occurrence and development of cancer. Among them, several serve as potential biomarkers or targets for personalized therapy2. Increasing evidence suggests that cancer samples can be classified into fusion positive and fusion negative subtypes through the formation of fusion genes3. The most common fusion genes occur between the TMPRSS2 gene and members of the ETS family4. TMPRSS2-ERG fusion positive and fusion negative tumors represent specific molecular subtypes of PCa. DNA methylation profiling can accurately distinguish these major PCa subtypes, thus offering new therapeutic approaches for fusion positive PCa4.

The human ETS transcription factor family consists of 27 members5. Among the ETS family members, ETV1, ETV4, ERG, and FLI1 are the most frequently involved in fusion gene events6. These fusion genes, frequently caused by translocation events in PCa, may regulate the functions of other the ETS family members7. ETS family members play crucial roles in tumor cells, including involvement in processes such as development, differentiation, proliferation, apoptosis, migration, etc8.

DNA methylation is an important epigenetic modification mechanism that can influence the development of tumors by altering the expression of tumor suppressor genes and oncogenes9. Specifically, methylation of CpG islands can lead to gene silencing, rendering important genes such as DNA repair genes and tumor suppressor genes inactive, closely associated with the formation of cancer10. In PCa samples containing the fusion gene TMPRSS2-ERG, high methylation may result in the inhibition of miR-26a, leading to reduced EZH2 enzyme inhibition, further disrupting the overall DNA methylation profile11. However, there is limited research on the subdivision of PCa DNA methylation patterns based on whether fusion genes involving the ETS family are present or not.

The present study aims to identify a novel subtyping of PCa based on the ETS family fusion positive and fusion negative categories. We conducted an analysis of the distribution of the ETS family fusion genes on chromosomes, fusion modes within reading frames, and predictions of structural domains for both fusion genes and their parental genes. Subsequently, we investigated the epigenomic DNA methylation patterns of PCa in different subgroups, including recurrent fusion positive and fusion negative, TMPRSS2-ERG fusion positive and fusion negative, and the ETS family fusion positive and fusion negative, to identify differentially methylated CpG sites. Finally, we integrated the methylation results with gene expression data from the same patient samples to explore the potential impact of distinct DNA methylation patterns on mRNA expression levels in PCa. These findings may provide new approaches for the treatment of PCa patients and offer new insights into personalized treatment.

Materials and methods

Publicly available data

The fusion gene data used in the present study were sourced from the ChimerDB4.012 database (https://www.kobic.re.kr/chimerdb/download). We obtained nucleotide and protein sequences for a total of 15,632 fusion genes associated with PCa from this database, and the data underwent deduplication. Additionally, we acquired data containing breakpoint information for 5,653 fusion genes. Methylation data related to PCa were obtained from the UCSC Xena13 database (https://xenabrowser.net/datapages/). Gene names and their corresponding methylation site ID/Gene Mapping data also originated from the UCSC Xena database.

Differentially expressed parental fusion genes

The RNA-seq dataset for PCa patients was downloaded from the TCGA database (https://www.cancer.gov/), which includes 481 PCa samples and 51 adjacent non-tumor tissue samples. The samples in this dataset correspond one-to-one with the fusion gene dataset samples from the ChimerDB4.0 database. Using our laboratory's previous research, we identified 764 genes differentially expressed between PCa and normal tissues14. We used the R package 'DESeq2' to screen for differentially expressed genes between PCa samples and normal tissue samples, with thresholds of |logFC|> 1, FDR < 0.01, and P < 0.05. By intersecting with 5,653 fusion genes specific to PCa from the ChimerDB4.0 database (reduced to 1,399 after deduplication), we identified 165 differentially expressed fusion precursors (hereafter referred to as parental genes). Among them, 23 parental genes exhibited repetitive fusions (occurring 2 times or more). Additionally, we identified 106 fusion genes, including 45 with repetitive fusions. Additionally, we identified 106 fusion genes, of which 104 are Single parental differentially expressed fusion genes (SPDEFGs) and 2 are Dual parental differentially expressed fusion genes (DPDEFGs). Subsequently, we extracted chromosomal location data for these 106 fusion genes, obtaining a total of 1,334 loci information.

The distribution and fusion patterns of fusion genes on chromosomes

We utilized Circos (version 0.69–8) (https://circos.ca/) to illustrate the distribution of the previously obtained 106 fusion genes on chromosomes. Additionally, we conducted a statistical analysis of the reading frame fusion patterns for these 106 fusion genes and employed the 'ggplot2' R package to generate a circular plot for visualization.

Domain analysis of fusion genes

We used the SMART15 website (http://smart.embl-heidelberg.de/) for domain prediction. For parental genes, we initially obtained protein sequences for 165 parental genes. During this process, four genes (LPHN2, C4orf22, C16orf62, ARHGAP23) did not have corresponding protein sequences, so we ended up with protein sequences for 161 parental genes. For fusion genes, due to data limitations, we only queried genes with repetitive fusions. Among the 106 fusion genes, we found 59 pairs (comprising a total of 72 parental genes) with repetitive fusions. Among the 72 parental genes, the queried protein sequences include those of 22 recurrent fusion parental genes and 50 non-recurrent fusion parental genes. By retrieving protein sequences for these 59 pairs of genes with repetitive fusions, we ultimately obtained sequences for 23 pairs of fusion genes and predicted their structural domains.

Survival analysis of fusion positive and fusion negative PCa

We grouped all tumor samples containing fusion genes into recurrent fusion positive (n = 82) and recurrent fusion negative (n = 399) groups, then analyzed the correlation between overall survival (OS) and the recurrent fusion positive and recurrent fusion negative groups. Additionally, considering TMPRSS2-ERG as the most common fusion gene in PCa, we further categorized tumor into TMPRSS2-ERG fusion positive (n = 186) and TMPRSS2-ERG fusion negative (n = 295) groups based on its presence, and then analyzed the correlation between OS and the TMPRSS2-ERG fusion positive and fusion negative groups. Furthermore, based on whether tumor samples contained the ETS family fusion genes, we subdivided the samples into the ETS family fusion positive (n = 227) and the ETS family fusion negative (n = 254) groups, and subsequently analyzed the correlation between OS and the ETS family fusion positive and fusion negative groups.

Differential methylation expression and pathway enrichment analysis

We employed the above three survival analysis grouping methods to investigate DNA methylation differences in PCa. Differential expression analyses for these three grouping methods were performed using the 'limma' R package. In the differential expression analysis, we selected p < 0.05 & |logFC|> 0.5 as the filtering criteria to obtain significantly differentially expressed CpG sites. Subsequently, we used the 'pheatmap' R package to create heatmaps separately, illustrating the results of differential expression. We performed KEGG pathway enrichment analysis on the mRNAs corresponding to differentially methylated CpG sites using the DAVID16 website (https://david.ncifcrf.gov/) to identify the important pathways these genes are involved in. Subsequently, using the R package 'ggplot2', we visualized the significantly enriched (p < 0.05) KEGG pathways with bubble charts.

Results

Distribution of fusion genes on and between chromosomes

In order to obtain information about the fusion patterns of fusion genes on chromosomes, we queried the fusion relationships of 106 fusion genes on and between chromosomes (Fig. 1A). Fusion events most commonly involve chromosomes 1, 4, 3, and 21. Overall, the frequency of interchromosomal fusion events was higher than that of intrachromosomal fusion events. Among them, fusion genes related to the ETS family had the highest occurrence frequency on chromosome 21, and the frequency of interchromosomal fusion events exceeded that of intrachromosomal fusion events (Fig. 1B).

Fig. 1
figure 1

Distribution of Fusion Genes on and between Chromosomes. (A) Distribution of 106 fusion genes on and between chromosomes; (B) Distribution of fusion genes involving the ETS family on and between chromosomes. Colored lines represent fusion events occurring between different chromosomes, while gray lines represent fusion events occurring within the same chromosome.

Reading frame fusion patterns of fusion genes

We conducted a statistical analysis of the reading frame fusion patterns of the 106 fusion genes (Fig. 2A, C). Approximately one-third of fusion events occurred within the reading frame (36.79%, n = 39). This suggests the preservation of the original reading frame of the parental genes, indicating potential functional retention when translation occurs. Additionally, 26.40% of fusion events (n = 28) involved UTR (Untranslated Regions), which often contain regulatory elements that can alter the expression levels of adjacent genes. Finally, one-third of fusion events occurred outside the reading frame (36.79%, n = 39), potentially leading to gene functional degradation. Among fusion genes involving the ETS family, out-of-frame fusion was the most common reading frame fusion pattern (Fig. 2B).

Fig. 2
figure 2

Reading Frame Fusion Patterns of Fusion Genes. (A) Reading frame fusion patterns of 106 fusion genes; (B) Reading frame fusion patterns of fusion genes involving the ETS family; (C) Schematic representation of reading frame fusion patterns of fusion genes, where "In-Frame" indicates that the coding regions of gene A and gene B are connected within the same reading frame, avoiding frameshift mutations, while "Out-of-Frame" indicates that the coding regions of gene A and gene B are not within the same reading frame, potentially causing frameshift mutations.

Structural domains of fusion genes

To study the differences between the domains of recurrent fusion genes and their corresponding parental genes, we separately analyzed the domains of recurrent fusion genes and their parental genes. The DPDEFGs ERG-ABCC4 and OR51E2-AMACR are shown (Fig. 3A, B). For SPDEFGs, we identified a total of 23 pairs (Fig. 3C–G). Domain prediction was conducted for these 23 pairs of recurrent fusion genes and their parental genes, resulting in a total of 5 fusion types (with one fusion gene unable to find corresponding domains) (Supplementary Table S1):

  1. (a)

    Retained partial domains of one parental gene (n = 12);

  2. (b)

    Combined domains of both parental genes, resulting in new domains (n = 2);

  3. (c)

    Retained the domains of both parental genes separately (n = 3);

  4. (d)

    Neither of the parental genes' domains were retained (n = 4);

  5. (e)

    Retained all domains of one parental gene (n = 1).

Fig. 3
figure 3

Structural Domains of Fusion Genes. Each fusion gene structure is displayed with the gene name and corresponding chromosome above it (in parentheses). (A, B) Structural domains of DPDEFGs. Gray lines represent genes; boxes with numbers represent exons (5' gene: orange, 3' gene: purple); dashed boxes indicate exons not shown in the figure; red stars represent fusion points; protein domains are displayed below the exon boxes, with TM in the blue box indicating transmembrane domains. (CG) Structural domains of SPDEFGs. Various shapes on the lines represent different domains. The SPDEFGs are: BAZ1B, LSAMP, ERG, BANK1, and KLK2.

In summary, the structural domains of fusion genes exhibited several scenarios: a portion involved the direct overlay of the structural domains of both parental genes, another portion entailed fusion genes retaining the domains of only one parental gene, and a further portion involved partial retention of the structural domains of both parental genes. Only a very small number of fusion genes introduced new domains on top of inheriting the original parental gene domains. Specifically, fusion genes with the ETS family genes as parental genes mostly retained partial domains of only one parental gene.

Survival analysis of fusion positive and fusion negative groups

To explore the relationship between fusion positive and fusion negative and overall survival (OS), we divided PCa samples into three groups for evaluation. Survival analysis of the recurrent fusion positive group (n = 82) and recurrent fusion negative group (n = 399) (Fig. 4A) revealed that, within the first 9 years of cancer diagnosis, the survival rate of the recurrent fusion negative group was higher than that of the recurrent fusion positive group (p = 0.032). Survival analysis conducted between the TMPRSS2-ERG fusion positive group (n = 168) and the TMPRSS2-ERG fusion negative group (n = 295) (Fig. 4B) revealed that the survival rate of the TMPRSS2-ERG fusion negative group was lower than that of the TMPRSS2-ERG fusion positive group (p = 0.41). For the ETS family fusion positive group (n = 227) and ETS family fusion negative group (n = 254), survival analysis results (Fig. 4C) revealed that within the first 9 years, the survival rate of the ETS family fusion positive group was higher than that of the ETS family fusion negative group (p = 0.99).

Fig. 4
figure 4

Survival analysis of fusion positive and fusion negative groups in different subgroups. (A) Survival analysis of the recurrent fusion positive and recurrent fusion negative groups. (B) Survival analysis of TMPRSS2-ERG fusion positive and fusion negative groups, where T2E represents the fusion gene TMPRSS2-ERG. (C) Survival analysis of the ETS family fusion positive and fusion negative groups.

DNA methylation in fusion positive and fusion negative PCa tumor

To explore the DNA methylation status of parental fusion genes, we conducted differential expression analysis of methylated CpG sites in recurrent fusion positive and recurrent fusion negative PCa tumor based on different groupings. Between the fusion positive and fusion negative groups, we identified a total of 183 differentially methylated CpG sites, with 12 upregulated and 171 downregulated (Fig. 5A) (Supplementary Table S2). Based on the methylation levels of these 183 differentially expressed CpG sites, the samples were clustered into two main clusters, with one cluster contained 37% of fusion positive tumor and the other cluster contained 86% of fusion negative tumor (Fig. 5B). These data suggested that the epigenetic characteristics of these 183 CpG sites could be used to distinguish between fusion positive and fusion negative PCa.

Fig. 5
figure 5

DNA methylation in PCa tumor with different fusion status. (A) Volcano plot showing differential expression of CpG sites between recurrent fusion positive and recurrent fusion negative groups. (B) Hierarchical clustering heatmap of CpG methylation levels in recurrent fusion positive and recurrent fusion negative groups. (C) Volcano plot depicting differential expression of CpG sites between TMPRSS2-ERG fusion positive and fusion negative groups. (D) Hierarchical clustering heatmap of CpG methylation levels in TMPRSS2-ERG fusion positive and fusion negative groups. (E) Volcano plot illustrating differential expression of CpG sites between the ETS family fusion positive and fusion negative groups. (F) Hierarchical clustering heatmap of CpG methylation levels in the ETS family fusion positive and fusion negative groups.

Similarly, based on the methylation differences between the TMPRSS2-ERG fusion positive group and TMPRSS2-ERG fusion negative group, we identified a total of 141 differentially methylated CpG sites, with 99 upregulated and 42 downregulated (Fig. 5C) (Supplementary Table S3). Based on the methylation levels of these 141 differentially expressed CpG sites, the samples were clustered into two main clusters, with one cluster contained 83% of TMPRSS2-ERG fusion positive tumor and the other cluster contained 89% of TMPRSS2-ERG fusion negative tumor (Fig. 5D). These data suggested that the epigenetic characteristics based on these 141 CpG sites could distinguish between TMPRSS2-ERG fusion positive and TMPRSS2-ERG fusion negative PCa.

In addition, based on the differential methylation analysis between the ETS family fusion positive group and the ETS family fusion negative group, we identified a total of 134 differentially methylated CpG sites, with 60 upregulated and 74 downregulated (Fig. 5E) (Supplementary Table S4). Based on the methylation levels of these 134 differentially expressed CpG sites, the samples were clustered into two main clusters, with one cluster containing 88% of the ETS family fusion positive tumor and the other cluster containing 89% of the ETS family fusion negative tumor (Fig. 5F). These data suggested that the epigenetic characteristics based on these 134 CpG sites could distinguish between the ETS family fusion positive and the ETS family fusion negative PCa.

Impact of DNA methylation on mRNA expression levels

We integrated DNA methylation results with mRNA expression data from the same PCa patient samples to investigate the potential impact of different DNA methylation on mRNA expression levels. Since we focused on ETS family fusion positive and fusion negative subtypes, we only analyzed the correlation between DNA methylation and mRNA expression levels in the ETS family fusion positive and fusion negative groups. Among the ETS family fusion negative and positive groups, we identified 134 CpG sites with significant methylation differences, which corresponded to the correlation between the expression levels of 93 mRNAs (Supplementary Fig. S1). We selected the top 10 CpG sites with the most significant differential expression based on p-values and demonstrated the correlation between the methylation levels of these sites and the expression levels of the corresponding 8 mRNAs (Fig. 6A–H). The methylation levels of most CpG sites showed a weak positive correlation with the corresponding mRNA expression levels. However, two hypermethylated CpG sites showed a strong negative correlation with their corresponding mRNA expression levels, namely, cg24345747 with CD8A showed a strong negative correlation (cor = − 0.388, p = 2.041e − 18) (Fig. 6B), and cg17701886 with B3GNT5 showed a strong negative correlation (cor = − 0.253, p = 1.946e − 08) (Fig. 6G).

Fig. 6
figure 6

The correlation between the methylation levels of the top 10 CpG sites with the most significantly different p-values in the ETS family fusion positive and fusion negative tumor groups and the expression levels of the corresponding 8 mRNAs. Among them, the low methylation sites cg06741367 and cg07059052 did not have corresponding mRNAs identified.

Then, we performed KEGG pathway enrichment analysis on the 93 mRNAs corresponding to the 134 differentially methylated CpG sites (Supplementary Fig. S2). These genes were involved in several KEGG pathways including fatty acid metabolism, Notch signaling pathway, adrenergic signaling in cardiomyocytes, and the synthesis and degradation of ketone bodies, primarily related to metabolic processes and regulation of signal transduction.

Discussion

In this study, we proposed considering the ETS family gene fusions in PCa as two distinct subtypes, positive and negative, and found substantial differences in the DNA methylation profiles between these subtypes. We analyzed the distribution of fusion genes in PCa and ETS family fusion positive genes on chromosomes, the fusion modes of reading frames, and the structural domain predictions of the fusion genes and their parental genes, in order to reveal the role of ETS family fusion genes in gene structure and functional regulation. Subsequently, we investigated the epigenomic DNA methylation patterns in different subgroups, including PCa recurrent fusion positive and negative, TMPRSS2-ERG fusion positive and negative, and the ETS family fusion positive and negative subtypes. This exploration aimed to identify differentially methylated CpG sites, shedding light on the relationships between various subgroups and overall survival rates. The findings revealed an increasing trend in mortality rates for PCa tumor with recurrent fusion genes, those without TMPRSS2-ERG fusion genes, and those with the ETS family fusion genes. Finally, we integrated the methylation results with gene expression data from the same patient samples to explore the potential impact of different DNA methylation patterns on mRNA expression levels in PCa.

In order to validate the ETS family fusion positive and negative subtypes of PCa, we conducted survival analysis and clustering of DNA methylation profiles using a uniform analytical approach for both the recurrent fusion positive and negative groups, as well as the TMPRSS2-ERG fusion positive and negative groups. The study results showed that ETS family fusion positive and negative PCa tumor could be distinctly differentiated by their DNA methylation profiles. In PCa, the subdivision of subtypes based on the DNA methylation landscape reveals significant differences between TMPRSS2-ERG gene fusion positive and TMPRSS2-ERG gene fusion negative tumors, elucidating distinct potential carcinogenic pathways between these molecular subtypes11. Furthermore, for rhabdomyosarcoma, the study of methylation characteristics of the PAX3-FOXO1 and PAX7-FOXO1 fusion genes, which were formed by linking the DNA binding domain of PAX3 or PAX7 to the transactivation domain of FOXO1, provided a new method for distinguishing between fusion positive and fusion negative rhabdomyosarcoma17.

We found that fusion events of PCa fusion genes were mainly concentrated on chromosomes 1, 4, and 21, with fusion events within chromosomes 3 and 4 being the most common. The frequency of the ETS family fusion events on chromosome 21 was the highest, and the frequency of interchromosomal fusion was higher than intrachromosomal fusion. These fusion events on chromosomes were closely related to cancer and diseases. For example, the FIP1L1-PDGFRA fusion gene associated with Hypereosinophilic Syndrome (HES) was located on chromosome 4. Cell molecular cytogenetic analysis techniques have shown that due to interstitial deletion of chromosome 4, the FIP1L1 gene fused and constitutively activated PDGFRA gene, leading to the production of a protein with tyrosine kinase activity, thereby stimulating sustained proliferation of eosinophils18. Additionally, the fusion gene AML1-ETO was associated with Acute Myeloid Leukemia (AML), leading to decreased survival rates and increased recurrence rates19. The fusion gene AML1-ETO, resulting from translocation between chromosomes 8 and 21, was one of the most common chromosomal abnormalities in AML20. For the ETS family of transcription factors, ETS Proto-Oncogene 2 (ETS2) was an important member of its transcription factor family, located on chromosome 21. The protein encoded by ETS2 was a Ca2+ dependent phosphorylated protein involved in regulating physiological and pathological processes such as cell proliferation, differentiation, and apoptosis21.

The distribution of fusion gene reading frame fusion modes in PCa was roughly equal between in-frame, out-of-frame, and 3'-UTR and 5'-UTR. The reading frame fusion ratio of the ETS family fusion genes mainly occurred out-of-frame. In Acute Myeloid Leukemia, t (1;21) could lead to out-of-frame fusion of RUNX1-CLCA2, and these out-of-frame fusions could generate hypothetical truncated RUNX1 isoforms22. Out-of-frame fusions retained the DNA-binding Runt domain but lacked the transcriptional regulatory domains of RUNX1. Truncated RUNX1 gene could promote the development of leukemia in patients23. In Philadelphia chromosome (Ph) positive leukemia, the COOH terminal portion of the transcription product of the tumor specific antigen Bcr-ABL contained an out-of-frame coding amino acid sequence from the ABL gene. These variants were expressed in Ph positive Chronic Myeloid Leukemia (CML) and Acute Lymphoblastic Leukemia (ALL) patients24. If a frameshift fusion occurs at the gene fusion point, it can convert a functional in-frame fusion into a dysfunctional out-of-frame fusion, thereby affecting the structure and function of the fusion protein25. Future research should precisely identify fusion breakpoints and analyze the impact of frameshift fusions on the reading frame to understand disease mechanisms and guide therapeutic strategies.

We conducted a statistical analysis of the domains of parental genes and their corresponding fusion genes. The domains of fusion genes could be categorized into several situations: some fusion gene domains were an overlap of the domains of both parental genes, while another portion of fusion gene domains retained only the domains of one parental gene. There was also a subset that retained only a portion of the domains of one parental gene. Only a very small number of fusion genes introduced new domains on the basis of the original parental genes. These preserved fusion gene domains played crucial roles in transcription, cell signaling, and the immune system. For example, the DDT (the DNA-binding homeobox containing proteins and the different transcription) domain included proteins and various transcription and chromatin remodeling factors. It collaborated with other protein domains to regulate biological processes such as transcription, replication, and repair26. The DBB domain referred to a domain shared by Dof (DNA binding with one finger), BANK1 (B-cell scaffold protein with ankyrin repeats 1), and BCAP (B-cell adapter for PI3K) proteins. This domain typically contained functional regions related to DNA binding or cell signaling27. The NLRP3 protein inflammasome was a crucial component of the innate immune system, and its aberrant activation could lead to inflammatory diseases. The LRR (Leucine-Rich Repeat) domain controlled inflammasome activation by mediating NLRP3 protein self association, oligomerization, and interaction with the essential regulator NEK728. Additionally, new domains like IG_like had emerged. The polycystic kidney disease gene PKD1 encoded polycystin-1, which included 16 IG_like domains (or PKD domains), indicating its significant role in cell–cell or cell–matrix interactions29.

Survival analysis results revealed that after 9 years follow up, the survival rate of the recurrent fusion negative group declined, whereas due to data limitations, the recurrent fusion positive group could not show subsequent survival rates. A significant statistical trend indicated that tumors in the recurrent fusion positive group had a higher risk of death compared to the recurrent fusion negative group. Although the statistical trend for the TMPRSS2-ERG fusion positive and negative groups was not very significant, compared to the TMPRSS2-ERG fusion positive group, the risk of disease progression or death in the TMPRSS2-ERG fusion negative group increased over time. Similarly, for the ETS family fusion positive and negative groups, after 9 years, the survival rate of the ETS family fusion positive group significantly decreased, while the fusion negative group could not provide subsequent survival rates due to data limitations. Although the statistical trend was not pronounced, the trend estimate suggested that compared to the ETS family fusion negative group, the risk of disease progression or death in the ETS family fusion positive group increased. This was also a limitation of the study; we only used PCa samples from TCGA, which represented a small dataset for finding differences. By analyzing the methylation levels of differentially expressed CpG sites between the ETS family fusion positive and negative groups, we conducted a cluster analysis of the samples and identified two main clusters. One cluster contained 88% of fusion positive tumors, and the other cluster contained 89% of fusion negative tumors. This suggested that the differentially methylated CpG sites were closely related to the ETS family fusion positive and negative status.

We identified hypermethylated CpG sites with significant differential expression between the ETS family fusion positive and fusion positive groups, including cg24345747 and cg17701886. We found a strong negative correlation between the methylation levels of these two CpG sites and the corresponding mRNA expression of CD8A and B3GNT5. These genes played crucial roles in the pathogenesis of cancer and diseases. CD8A (Cluster of Differentiation 8A) was a member of the T cell cytotoxic pathway related genes, encoding the CD8 antigen that collaborated with the T cell receptor on T cells to recognize and present antigens30. Additionally, in childhood asthma samples, higher methylation of CpG sites in the CD8A promoter region significantly downregulated CD8A expression, affecting the TCR (T-cell receptor) signaling pathway, thereby regulating the progression of childhood atopic asthma31. Radiogenomic features indicated that predicting the expression of CD8A in bladder cancer patients preoperatively contributed to predicting patient prognosis and sensitivity to immunotherapy32. Copy number amplification and hypomethylation of the promoter region of B3GNT5 (β-1,3-N-acetylglucosaminyltransferase 5) gene contributed to its overexpression in the most invasive subtype of breast cancer, basal-like breast cancer (BLBC). It served as a prognostic marker and therapeutic target for BLBC33. Dysregulation of sphingolipid metabolism was the major pathway in non-small cell lung cancer patients34, and B3GNT5 gene, along with GAL3ST1 (β-1,4-galactosyltransferase 1) gene, altered the levels of metabolites such as lactate, sphingolipids, and sulfides in the serum of non-small cell lung cancer patients35. This differential regulation affected the proliferation, migration, and invasion of tumor cells35.

The correlation between the top ranked differentially expressed CpG sites with significant p-values for hypermethylation and the corresponding mRNA expression levels could be either positive or negative, depending on the location of aberrantly methylated CpG sites within the gene36. For instance, the downregulation of the DNMT3B gene could result in upregulation through DNA remethylation, depending on its local chromatin structure37.

Conclusions

In conclusion, there were substantial differences in the DNA methylation profiles between PCa tumors that are positive and negative for ETS family gene fusions. We analyzed the distribution of fusion genes in PCa and ETS family fusion positive genes on chromosomes, the fusion modes of reading frames, and the structural domain predictions of the fusion genes and their parental genes, in order to reveal the role of ETS family fusion genes in gene structure and functional regulation. Furthermore, hypermethylated CpG sites showed a strong negative correlation with the corresponding mRNA of CD8A and B3GNT5, indicating potential as prognostic markers for PCa. Therefore, by categorizing PCa into ETS family fusion positive and fusion negative subtypes based on epigenomic DNA methylation profiles, we have provided new perspectives on the treatment of PCa.