1 Introduction

One of the major cancers that pose a threat to Chinese citizens’ lives and health and have created a significant societal burden is colorectal cancer (CRC). As per the most recent data given by the National Cancer Center, in China in 2015, there were 387,600 new cases of colorectal cancer, which accounted for 9.87% of all malignant tumor incidences; also, colorectal cancer caused 187,100 fatalities, which made up 8.01% of all malignant tumor deaths [1]. The typical progression time from precancerous lesions to cancer is 5 to 10 years, which presents a critical window of opportunity for early diagnosis and therapeutic intervention [2]. The stage at diagnosis also has a significant bearing on the prognosis of colorectal cancer. Only 14% of patients with stage IV colorectal cancer who have distant metastases will survive for five years, compared to 90% of those with stage I colorectal cancer. Molecular immunotherapy, chemical drug therapy, surgery, drug treatment, and other techniques are currently the main forms of treatment for colon cancer [3]. However, we still do not have a better treatment for advanced colon cancer.

Several genetic and molecular alterations have occurred in the development and course of colorectal cancer. The two most significant ones are the ongoing activation of oncogenes and the ongoing deletion of tumor suppressor genes [4, 5]. Classical tumor suppressor genes such as P53, APC, PTEN, SMAD4, ACVR2, and BAX are found to be absent to varying degrees in colorectal tumors [6, 7]. Approximately 70 to 80% of colorectal tumors exhibit functional inactivation that is attributed to somatic mutations in the APC gene. This inactivation of the APC gene serves as a critical initiating event in the progression of intestinal tumors towards malignancy. Additionally, it is noteworthy that approximately 40% of large colorectal adenomas, measuring greater than 4 centimeters in size, harbor KRAS gene mutations. These genetic alterations contribute to the complex genetic landscape of colorectal cancer, highlighting the importance of understanding their roles in disease initiation and progression [8]. The RAS gene downregulates many signaling pathways, such as MAPK and PI3K. About 5–10% of colorectal tumors have B-raf mutations, and the RAS gene activates the B-raf gene to activate the MAPK effectors MEK1 and MEK2 to perform their function. About 60–70% of colorectal tumors have P53 gene mutations, and point mutations in this gene will seriously interfere with the normal cell cycle, thereby accelerating the growth and invasion of colorectal cancer cells [9, 10]. Therefore, gene sequencing technology is widely used to detect biomarkers in solid tumors and predict immunotherapy effects.

With no alteration to the DNA sequence, DNA methylation is a significant epigenetic change that covalently bonds a methyl group at carbon position 5 of cytosine of CpG dinucleotides in the genome. Organizations to control the expression of genes [11, 12]. This alteration may impact the gene’s promoter region’s accessibility, which may control the gene’s transcriptional activity. Aberrant alterations in DNA methylation, such as the hypomethylation of tumor suppressor genes and the hypermethylation of oncogenes, are frequently seen in cancer. The development, occurrence, and prognosis of tumors are closely associated with these alterations. Understanding different methylation patterns in colorectal cancer is critical for diagnosis, treatment, and prognostic assessment. High levels of DNA methylation can often affect the expression of tumor suppressor genes by reducing the transcriptional activity of the genes, such as reducing expression or silencing expression, thereby indirectly inducing malignant tumors. High levels of DNA methylation of cancer-promoting genes usually inhibit the occurrence of malignant tumors by reducing the transcriptional activity of tumor-promoting genes. Therefore, by studying methylation patterns in colorectal cancer, potential therapeutic targets and prognostic markers can be discovered, providing an important basis for personalized treatment and clinical management.

In addition, with the advancement of technology, such as high-throughput sequencing and bioinformatics analysis, we can gain a more comprehensive understanding of the methylation profile in colorectal cancer and discover key methylation events related to tumor initiation and progression [11,12,13]. Thanks to this research, the molecular mechanisms behind colorectal cancer are better understood. It also provides a vital theoretical and practical foundation for the advancement of innovative treatment methodologies and techniques for prognostic estimation.

By examining the breadth and depth of studies investigating m6A in cancers beyond CRC, we can gain a more holistic understanding of its universal and cancer-specific functions. This comparison would reveal whether m6A modification exhibits conserved mechanisms of action across diverse malignancies or whether its roles and impacts are tailored to specific cancer types. For instance, studies in breast cancer might uncover m6A’s involvement in estrogen receptor signaling, while in lung cancer, its influence on cell migration and invasion pathways may be more pronounced.

With the current development of computational biology, WGCNA is used to identify gene clusters with the same function, and machine learning methods are used to construct prognostic scores with significant prognostic effects [14]. Therefore, this study used three single-cell cohorts GSE166555 [15], GSE146771 [16], and EMTAB8107 [17], Single-cell Hub 2 (TISCH2) database [18], and 5 transcriptome cohorts GSE17536 [19], GSE39582 [20], GSE72970 [21] and TCGA-CRC (TCGA-COAD and TCGA-READ) for transcriptome and single-cell combination of colorectal cancer. Analyze and target the tumor heterogeneity of different methylation expression patterns in colon cancer and construct a prognostic model of colorectal cancer based on methylation-related regulatory genes. A new theoretical foundation and clinical practice guidelines for the customized treatment of colorectal cancer are anticipated to be provided by the development of this cutting-edge study, which is also anticipated to improve patients’ health and quality of life.

2 Methods

2.1 CRC dataset sources and preprocessing

The databases of Genotype-Tissue Expression (GTEx), Gene Expression Omnibus (GEO), and The Cancer Genome Atlas (TCGA) were thoroughly searched for public gene-expression data and comprehensive clinical annotation. We removed patients without survival data from additional analysis. We also obtained information on somatic mutations from the TCGA database.

For CRC scRNA-seq datasets, we obtained three publicly free cohorts, including GSE166555 [15], GSE146771 [16], and EMTAB8107 [17], from the Tumor Immune Single-cell Hub 2 (TISCH2) database [18]. The harmony approach was utilized to integrate three distinct scRNA-seq datasets that underwent thorough analysis in accordance with the Seurat process. Annotation of each cell cluster was performed by the SingleR R package [22] and literature references.

2.2 Extraction of expression levels of m6A regulators

We summarized the intersect genes of from GSE17536 [19], GSE39582 [20], GSE72970 [21] and TCGA-CRC (TCGA-COAD and TCGA-READ), and merged the two TCGA datasets (termed as TCGA-CRC) using “sva” package. A total of 20 m6A regulators were extracted from one previous study, including 2 erasers (ALKBH5 and FTO), 7 writers (METTL3, METTL14, WTAP, VIRMA, RBM15, RBM15B, and ZC3H13) and 11 readers (YTHDC1, IGF2BP1, IGF2BP2, IGF2BP3, YTHDF1, YTHDF3, YTHDC2, HNRNPA2B1, YTHDF2, HNRNPC, and RBMX) [23, 24]. Subsequently, for further investigation, unique m6A alteration patterns were found by unsupervised clustering analysis based on patient classification and the expression of 20 m6A regulators. The stability of the cluster numbers was determined using the consensus clustering algorithm. The “ConsensusClusterPlus“ [25] package was utilized for the above steps.

2.3 Gene set variation analysis (GSVA) and functional annotation

To comprehensively evaluate the differences in biological processes and signaling pathways that underlie distinct patterns of m6A modification alterations in our study, we employed Gene Set Variation Analysis (GSVA), a powerful tool for assessing the activity of predefined gene sets within a given sample set. This analysis was facilitated by leveraging the functionality of the “GSVA” R package, which offers a streamlined and intuitive interface for performing such calculations.

To construct our gene sets for GSVA enrichment analysis, we turned to the Molecular Signatures Database (MSigDB), a comprehensive resource that hosts a wide range of annotated gene sets derived from various biological contexts. Specifically, we downloaded two gene set collections from MSigDB: the “c2.cp.kegg.v2022.1.Hs.symbols.gmt” gene set, which contains curated gene sets based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the “c5.go.bp.v2022.1.Hs.symbols.gmt” gene set, which comprises gene sets annotated with Gene Ontology (GO) biological process terms. Both gene set collections were retrieved in their latest versions (as of 2022.1) and used in their symbol format, ensuring compatibility with our gene expression data.

By applying GSVA analysis to our dataset using these gene sets, we were able to quantify the activity of KEGG pathways and GO biological processes across the different patterns of m6A alteration observed in our study. This approach allowed us to gain insights into the specific biological processes and signaling cascades that are perturbed in each pattern, providing a deeper understanding of the functional consequences of m6A modification alterations in CRC. The results of this analysis were then used to inform our discussion of the potential mechanisms and clinical implications of our findings.

2.4 Differentially expressed genes (DEGs) identification between m6A distinct phenotypes

Based on the expression of 20 m6A regulators, the “ConsensusClusterPlus” program was utilized to identify m6A-related genes and divide patients into two different patterns of m6A alteration. The limma R package’s empirical Bayesian method was used to determine DEGs between various modification patterns. Adjusted P < 0.01 & |logFC| >1 were the criteria used to determine DEGs.

Next, the prognostic analysis was conducted using the univariate Cox regression model to identify genes associated with CRC prognosis. Based on these genes, ConsensusClusterPlus was repeatedly utilized to stratify the CRC cohorts into two groups (termed gene clusters) with distinct features.

2.5 Generation of m6A gene signature

To quantitatively assess the unique m6A modification patterns exhibited by individual colorectal cancer (CRC) tumors, we devised a comprehensive scoring system tailored specifically for this purpose. This scoring framework, termed the “m6Ascore,” represents a gene signature that captures the essence of m6A modifications within CRC tumors.

The establishment of the m6A gene signature followed a rigorous and systematic protocol. As a starting point, we leveraged two distinct gene clusters, which were identified through sophisticated bioinformatic analyses of our CRC dataset. Within these clusters, we pinpointed differentially expressed genes (DEGs) that demonstrated significant prognostic value, employing both the limma package for differential expression analysis and the univariate Cox regression model for survival analysis. This initial screening step helped us narrow down the focus to those genes whose expression levels were most predictive of CRC outcomes.

Next, we delved deeper into the prognostic potential of these selected genes by subjecting them to multivariate Cox regression analysis. This advanced statistical approach allowed us to account for potential confounding factors and interactions among the genes, ultimately refining our list of prognostic candidates. To further refine our signature and enhance its predictive accuracy, we applied the Least Absolute Shrinkage and Selector Operation (LASSO) method. LASSO is a powerful tool for regression analysis that automatically selects important variables and shrinks the coefficient estimates towards zero, thereby reducing the complexity of the model and improving its interpretability.

Through this rigorous and iterative process, we developed the m6A gene signature, embodied in the m6Ascore, which provides a quantitative assessment of the m6A modification patterns in individual CRC tumors. This signature serves as a valuable tool for investigating the role of m6A modifications in CRC biology and exploring their potential clinical implications [26] Each CRC’s risk score was determined using the formula: Risk score = ExpGene1 × CoefGene1 + ExpGene2 × CoefGene2 + … ExpGene(n) × CoefGene(n). In this equation, “ExpGene” stood for gene expression and “CoefGene” for the regression coefficient.

2.6 Scoring of the m6A signature at the single-cell level

To calculate enrichment scores of m6A signature in single cells, the AddModuleScore function of the Seurat package [27] was performed. This function enables the computation of enrichment scores for predefined gene sets within the context of single-cell RNA sequencing (scRNA-seq) data, providing a valuable metric for evaluating the presence and abundance of specific biological signatures at the single-cell level.

Once the enrichment scores for the m6A signature were calculated, we turned to the SCP R package for visualization purposes. The SCP package offers a suite of tools for exploring and presenting scRNA-seq data in an informative and intuitive manner. By leveraging the capabilities of this package, we were able to visualize the enrichment of the m6A signature across different single-cell populations, revealing insights into the distribution and heterogeneity of m6A modifications at the cellular level. This approach enabled us to gain a deeper understanding of the role of m6A modifications in shaping the transcriptomic landscape of CRC and its implications for disease biology and progression.

2.7 Drug sensitivity analysis

The “OncoPredict” R package was used to analyze the targeted and chemotherapeutic drugs [28]. OncoPredict simulates the relationship between drug IC50 values and tissue gene expression patterns taken from the Broad Institute Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer, respectively. From the Cancer Immunome Atlas (https://tcia.at/home), the immunophenotype scores (IPS) of CRC patients were obtained. The IPS was used to evaluate each patient’s response to immune checkpoint inhibitor-based immunotherapy (anti-CTLA-4 and anti-PD-1).

Subsequently, we stratified the CRC sample cohort into high-risk and low-risk groups, utilizing the median risk score value as the threshold for classification. This division allowed us to examine the distinct outcomes associated with differing levels of risk, as determined by our m6A signature.

To rigorously evaluate the prognostic potential and effectiveness of our signature, we employed two statistical methods: the log-rank test of the Kaplan-Meier (KM) survival curve and the time-dependent receiver operating characteristic (ROC) curve analysis. The log-rank test, a widely used statistical test for comparing survival curves between two or more groups, enabled us to assess whether there was a significant difference in survival outcomes between the high-risk and low-risk groups defined by our signature. Meanwhile, the time-dependent ROC curve analysis provided a comprehensive evaluation of the sensitivity and specificity of our signature in predicting survival outcomes at different time points, further demonstrating its effectiveness as a prognostic tool.

2.8 Statistical analysis

In our study, statistical analysis was conducted using R software version 4.1.3, a powerful and versatile tool for data analysis. To compare continuous data between two groups, we employed either the independent Student’s t-test for normally distributed variables or the Mann-Whitney U test for non-parametric data, ensuring that the appropriate test was used based on the data’s distribution. For categorical data comparisons, we utilized the χ2 test, which is well-suited for examining the association between categorical variables.

In the context of correlation analysis, we adopted a two-pronged approach based on the distribution of the variables. For normally distributed variables, we employed Pearson’s correlation test, which measures the linear correlation between two variables. Conversely, for non-normally distributed variables, we utilized Spearman’s correlation test, which assesses the strength and direction of a monotonic relationship between two variables without making assumptions about their underlying distributions.

To determine statistical significance, we adopted a conservative threshold of a two-tailed P-value less than 0.05. This threshold is commonly used in statistical hypothesis testing to indicate that the observed difference between groups or the strength of the correlation is unlikely to have occurred by chance, thereby providing evidence for a genuine effect.

3 Results

3.1 Landscape of m6A regulators’ genetic variation in CRC

We identified 20 m6A regulators, comprising 2 erasers, 7 writers, and 11 readers. In the beginning, we compiled copy number variations and somatic mutations in CRC regulators. 143 of the 583 CRC samples collected had m6A regulator mutations; the frequency of these mutations was determined to be 24.53%. We found that in CRC samples, the one that demethylases ALKBH5 had the low mutation, whereas ZC3H13 and VIRMA showed the highest frequency of mutation (Fig. 1a). Twenty regulators were discovered to have a common CNV modification when the frequency of CNV alteration was examined; copy number amplification accounted for the bulk of these modifications. On the other hand, the frequency of CNV deletion was distributed in ALKBH5, HNRNPC, METTL14, METTL3, RBM15, YTHDC1, YTHDC2, and YTHDF2 (Fig. 1b). Using the merged TCGA-CRC and GTEx dataset, Fig. 1c showed the expression patterns of the 20 m6A regulators, demonstrating the higher expression levels in CRC tumor tissues relative to CRC normal tissues. Similar findings were addressed by Fig. 1d.

Fig. 1
figure 1

M6A regulator genetic variation in CRC. a The incidence of mutations in 20 m6A regulators among 583 CRC patients from the TCGA-CRC cohort. Individual patients were represented by each column. b The m6A regulators’ CNV variation frequency within the TCGA-CRC cohort. c A heatmap displaying the 20 m6A regulators’ expression profiles in tumor and normal colorectal cancer tissues from the combined TCGA-CRC and GTEx cohort. d The difference in 20 m6A regulator expression between normal and CRC tissues in the TCGA-CRC cohort. The statistical p-value (**P < 0.01; ***P < 0.001) was indicated by the asterisks

3.2 20 regulators mediating patterns of m6A methylation modification

We utilized a m6A regulator network to depict the comprehensive landscape of patients with CRC, encompassing the connections, interactions, and intercorrelations of m6A regulators (Fig. 2a). The analysis focused on the difference in the M6A methylation regulator genes expression between the wild and mutant forms of VIRMA since VIRMA showed a comparatively greater mutation frequency. The results indicated that mutant tumors exhibited increased expression of numerous m6A regulators, including IGF2BP3, METTL14, HNRNPA2B1, and ALKBH5, relative to wild-type tumors (Fig. 2b). Based on the expression of 20 m6A regulators, the R package of ConsensusClusterPlus was utilized to categorize patients with qualitatively distinct patterns of m6A modification. Ultimately, we were able to identify two distinct patterns of modification through unsupervised clustering, including Cluster 1 (referred to as C1) and Cluster 2 (referred to as C2) (Fig. 3a–c). Significant variations were also seen in the m6A transcriptional profile between the two different m6A modification patterns (Fig. 3d). The prognosis analysis conducted on the two primary subtypes of m6A modifications revealed a considerably notable survival benefit in the m6Acluster-C1 modification pattern (Fig. 4a).

Fig. 2
figure 2

Displays the interplay among m6A regulators in CRC. Subfigure A illustrates the correlation between the 20 m6A regulators in CRC using data from the TCGA-CRC cohort. Subfigure B presents the variance in m6A regulator expression between VIRMA-mutant and wild types

Fig. 3
figure 3

Depicts molecular clusters that are based on m6A. a displays the glioma sample consensus score matrix in TCGA-CRC with the clustering number k set to 2. The degree to which the two samples interact is indicated by the consensus score. b shows the cumulative distribution function (CDF) curves, whereas c displays PAC scores for every value of k in the consensus matrix. d displays the gene expression patterns for m6A regulators and clinicopathological features across different clusters

Fig. 4
figure 4

Clusters with distinct prognosis and TME landscapes. a Survival analysis between C1 and C2. b, c GSVA of KEGG (b) and GO-BP (c) terms between clusters. d Abundances of infiltrating immune cells between clusters. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. ns: not significance

3.3 TME cell’s infiltration characteristics in distinct modification patterns of m6A

Using GSVA enrichment analysis, the biological behaviors of these various m6A modification patterns were investigated. m6Acluster-C2 showed marked enrichment in KEGG pathways, including cell cycle, basal transcription factors, ubiquitin-mediated proteolysis, and mismatch repair (Fig. 4b). Additionally, m6Acluster-C2 had substantially elevated activation levels for GO-BP terms, including RNA splicing, mRNA processing, control of mRNA processing, and mRNA metabolic process (Fig. 4c).

Decoding the TME contexture by the CIBERSORT method unraveled that m6Acluster-C2 was characterized by highly infiltrated memory B cells, M0 macrophage, resting NK cells, and activated memory CD4 + T cells. Contrarily, m6Acluster-C1 was featured by enriched regulatory T cells and plasma cells (Fig. 4d).

3.4 Generation of m6A gene signatures

Each m6A modification pattern’s potential biological function was examined by identifying m6A phenotype-associated intersect DEGs using the limma program. Subsequently, a univariate Cox regression prognostic analysis was performed for every DEG. The 20 genes with a significant prognosis are shown in Fig. 5a. We next used unsupervised clustering analysis to classify patients into distinct genomic subgroups based on the 20 m6A phenotype-associated genes as a guide. Results from the unsupervised clustering approach were in line with the m6A modification patterns clustering grouping. Additionally, two distinct m6A alteration genomic phenotypes, known as m6A gene clusters 1 and 2, were identified by these results (Fig. 5b–d). This indicates that m6A methylation modification occurs in CRC in two different patterns. In m6A gene cluster 2, the 20 m6A regulators were upregulated (Fig. 5e). Additionally, it was demonstrated that CRC patients in gene cluster 1 had a better prognosis than those in gene cluster 2, which had a worse prognosis(Fig. 5f).

Fig. 5
figure 5

m6A-based gene clusters. a Forest plot illustrating the 20 prognostic m6A pattern-derived DEGs. b The TCGA-CRC dataset’s consensus score matrix for glioma samples is displayed with a clustering number of 2. Every element in the matrix represents the consensus score, which shows how much the two samples interacted. c, d The CDF curves (c) and PAC scores (d) of the consensus matrix for each k. e Expression profiles of m6A regulators and clinicopathological characteristics between gene clusters. f Survival analysis between gene cluster 1 and gene cluster 2

3.5 Generation of m6A score

Using LASSO + multivariate Cox regression analysis, we generated a m6A-related signature and the m6A score based on the prognostic DEGs between the two m6A gene clusters for each CRC patient (Fig. 6a and b). The four distinct cohorts’ prognostic prediction performance for the m6A scores was shown using KM survival analysis and time-dependent ROC analysis (Fig. 6c–f).

Fig. 6
figure 6

Development of the m6A-related signature. a, b The selection of prognostic hub genes based on the optimal parameter λ obtained in the LASSO regression analysis. cf Survival analysis and prognostic performance of the signature in the four cohorts

The scoring system known as m6Ascore was created to examine the complexity and individual heterogeneity of m6A alteration under the R 4.3.1 analysis environment. The m6A alteration pattern in CRC patients is evaluated by this approach using principal component analysis (PCA).Based on their m6Ascore, there was a noticeable difference between the m6A clusters. Cluster 2 exhibited a higher score, but Cluster 1 displayed a lower score in m6A (Fig. 7a). The fact that compared to m6A gene cluster 1, m6A gene cluster 2 had significantly increased m6Ascore (Fig. 7b).

Fig. 7
figure 7

Outlines the characteristics of m6A modification in tumor somatic mutation. a and b depict differences in m6A score between two m6A modification patterns and two gene clusters, respectively. c The figure illustrates the variations in microsatellite subtypes between the high and low m6Ascore groups. The terms MSS and MSI-H stand for microsatellite stable, low microsatellite instability, and high microsatellite instability, respectively. d Shows the differences in TMB between the TCGA-CRC cohort’s high and low m6Ascores. e and f Depicts the tumor somatic mutation waterfall map created by the TCGA-CRC cohort’s individuals with high and low m6Ascore scores. Each column in the plot represents a distinct patient

3.6 m6A modification’ characteristics in tumor somatic mutation

Subsequently, we examined the correlation between m6Ascore and microsatellite instability (MSI). The findings showed that, in comparison to the low m6Ascore group, the high m6Ascore group had a higher proportion of MSI-H worse (Fig. 7c). Next, the TCGA-CRC cohort’s differences in somatic mutation distribution between the high and low m6Ascores were examined using the maftools package. The high m6Ascore group showed a greater tumor mutation burden than the low m6Ascore group (Fig. 7e, f). The substantial correlation between greater TMB and high m6Ascore tumors was validated by the TMB quantification analyses (Fig. 7d).

3.7 Characteristics of m6A modification in the immunotherapy

Subsequent findings revealed that the high and low m6Ascore groups had varied PD-L1 expression patterns, with the former showing high expression (Fig. 8a), suggesting a possible response to anti-PD-1/L1 therapy. In order to determine whether the patient’s reaction to ICB could be predicted, the signature of the m6A alteration was investigated. We obtained CRC immunophenoscore (IPS) data from The Cancer Immunome Atlas (TCIA; https://tcia.at/home). When compared to patients from the high m6Ascore group who received anti-CTLA4 and/or anti-PD-1 medication, analysis revealed that patients from the low m6Ascore group had better results (Fig. 8b).

Fig. 8
figure 8

This figure shows what m6A change looks like in immunotherapy. A Shows how the levels of PD-L1 are different in the low and high m6Ascore groups. B We used data from TCIA (https://tcia.at/home) to show the differences in immunophenoscore between groups with high and low m6Ascore

As previously reported, distinct sensitivities to ICB may be associated with obviously different TME features. Resting NK cells, M0 macrophages, and naïve B cells were more abundantly infiltrated in CRC patients with high m6Ascore, while activated mast cells and neutrophils were more abundantly infiltrated in patients with low m6Ascore (Fig. 9a). Furthermore, investigation of the biomarkers of cell chemokine factors and immune check-point protein encoding genes showed that high m6Ascore was associated with elevated levels of these markers (Fig. 9b, c).

Fig. 9
figure 9

Showcases the TME phenotypes between risk groups. a Shows a box plot showing how 22 types of immune cells were distributed between the two risk groups based on CIBERSORT. b and c Show box plots that show the differences in the expression levels of cell chemokine factors and immune check-point protein encoding genes between the two risk groups

3.8 Characteristics of m6A modification in cancer hallmarks

To uncover the dysregulated pathways between m6A phenotypes, we identified significantly upregulated genes in m6A gene cluster 2 using the limma R package. By utilizing the GSEA method, we found that hypoxia, TGFβ signaling, hedgehog signaling, KRAS signaling, TNFA-NFκB signaling, and estrogen response pathways were activated in the high m6A score group. Fatty acid metabolism, oxidative phosphorylation, and MYC target V1 were downregulated in the group with a high m6A score (Fig. 10).

Fig. 10
figure 10

Dysregulated cancer hallmarks between two risk groups

3.9 Characteristics of m6A modification in the chemotherapy

It was looked into whether the signature of the m6A modification may be used to predict a patient’s sensitivity to chemotherapy. Analysis showed that patients with low m6A scores would be better candidates for treatment with vinblastine, paclitaxel, docetaxel, and cisplatin than those with high m6A scores (Fig. 11a). Further analysis of the mRNAsi index revealed that CRC patients with high m6A scores had significantly lower mRNAsi than those with low m6A scores (Fig. 11b).

Fig. 11
figure 11

Displays the therapeutic sensitivity between two risk groups. a Presents the estimated IC50 of the four chemotherapeutic agents between the two risk groups, while b shows a violin plot displaying the mRNAs index between them

3.10 m6A modification’ characteristics at single-cell levels

To elucidate the features of m6A signature at CRC single-cell levels, we obtained and integrated three datasets with 110,802 cells (Fig. 12a). Under the resolution 0.6, 26 cell clusters were identified and annotated as 17 major cell sets (Fig. 12b, c). Markers of each cell set are listed in Fig. 12d. By AddMouduleScore() function, we qualified the m6A signature scores of each CRC cell and found that the score was elevated in the malignant cell set (Fig. 12e).

Fig. 12
figure 12

The highly activated m6A-related signature in scRNA-seq datasets of CRC. a UMAP visualization of 110,802 cells from three public CRC scRNA-seq cohorts. b 17 major cell types were manually annotated. c UMAP visualization was employed to represent the 17 major cell sets across the three scRNA-seq datasets. d Vlnplots illustrate the expression values of cell type-specific markers. e The signature genes expression at single cell level determined by AddModuleScore() function in Seurat

4 Discussion

Colorectal cancer is the second most common cause of death connected to cancer [29]. Despite significant advances in treatment modalities over the past few decades, such as refinement in surgical techniques, radiotherapy, chemotherapy, targeted therapy, and immunotherapy, the efficacy of treatments remains constrained by the lack of early detection rates and personalized therapeutic strategies [30,31,32,33,34,35]. Further research into the mechanisms underlying the formation and progression of colorectal cancer is required in order to provide a theoretical foundation for diagnosis, therapy, and prognostic evaluation. Our study employed a significantly larger sample size and integrated multiple independent datasets for comprehensive analysis. This large-scale approach enabled us to identify more refined and stable m6A modification patterns, reducing the impact of random errors on our results.

We retrieved data on various types of CRC from several sources to examine the expression patterns of 20 m6A regulators. Regarding copy number variations, most regulators exhibited amplification. Because most of the regulators in the tumor group have high expression levels, we hypothesize that copy number amplification plays a role in the higher expression levels of these regulators in tumors. Additionally, various regulators showed different degrees of mutation. VIRMA exhibited a relatively high mutation frequency, and in its mutated group, expression of other m6A regulatory factors was upregulated. Studies have suggested that writers form complexes that act synergistically, with VIRMA serving as a scaffold protein that brings together WTAP, HAKAI, and ZC3H13, creating a binding site for METTL3 and METTL14, thereby facilitating their optimal catalytic function [36]. We hypothesize that mutations and copy number variations in VIRMA promote its overexpression and enhance its ability to interact synergistically with other proteins in the complex, although further research is needed to confirm our conjecture.

To confirm the association between m6A score and chemotherapy sensitivity, it is necessary to validate this finding in large-scale, multi-center clinical trials. This can ensure the reliability and reproducibility of the results. In a multi-center clinical trial involving thousands of breast cancer patients, researchers collected data on patients’ m6A scores and recorded their treatment responses to paclitaxel. Through statistical analysis, they found that patients with higher m6A scores did indeed exhibit greater chemotherapy sensitivity, a result that was previously validated in smaller-scale studies.

Beyond observational studies, it is also crucial to conduct mechanistic research to uncover the specific pathways through which m6A score influences chemotherapy sensitivity. Researchers, utilizing in vitro cell experiments and animal models, discovered that m6A modifications can impact the expression of certain genes related to chemotherapy sensitivity, such as by regulating apoptotic pathways or the activity of drug metabolism-related enzymes. These findings provide mechanistic support for the potential of m6A score as a predictive indicator of chemotherapy sensitivity.

Ultimately, the clinical application value of m6A score must be evaluated, including its predictive accuracy, cost-effectiveness, and impact on patient treatment decisions. In a prospective study, researchers developed personalized chemotherapy regimens based on patients’ m6A scores. The results showed that patients who received personalized treatments demonstrated advantages in terms of survival rates, quality of life, and other metrics compared to those who did not have their treatment guided by m6A scores. This indicates that m6A score holds significant potential for clinical application.

Our comprehensive investigation will encompass a diverse array of epigenetic alterations, encompassing DNA methylation, histone modifications, non-coding RNAs, and notably, m6A modifications. By juxtaposing and contrasting the insights garnered from these distinct yet interconnected studies, we aim to illuminate the intricate similarities and fundamental differences among these epigenetic modulators in colorectal cancer (CRC). Our focus will be directed towards elucidating the unique contribution of each epigenetic mark to the multifaceted biology of CRC, scrutinizing their roles in gene expression regulation, cell proliferation dynamics, tumor invasion, metastasis processes, and treatment responsiveness.

Furthermore, we will delve into the implications of our findings for advancing CRC research and clinical practice. Specifically, we will contemplate how a deeper understanding of the functional roles of various epigenetic modifications in CRC can inspire the conception and refinement of novel diagnostic tools, prognostic indicators, and therapeutic interventions. This includes exploring the potential to harness epigenetic alterations as predictive biomarkers or therapeutic targets.

Additionally, we will open the discussion to prospective research avenues, emphasizing the need to unravel the intricate interplay, or “crosstalk,” between m6A modifications and other epigenetic pathways within the CRC landscape. Such inquiries may uncover novel regulatory mechanisms and uncover unexpected therapeutic vulnerabilities, thereby expanding the horizons of CRC research and treatment options.

We divided all patients into two groups according to the m6A regulator expression patterns using unsupervised clustering techniques. Cluster 1 (C1) exhibited significant survival advantages. Upon thoroughly comparing the similarities and differences between the two clusters, we observed significant enrichment and activation of pathways related to proliferation, protein metabolism, RNA processing, and others in Cluster 2 (C2). This suggests that cancer cells in C2 are more active and prone to progression. Immune infiltration analysis revealed a less active immune cell infiltration status in C2 than C1, indicating poor tumor immune levels. Integrating these results, we propose that active proliferation and immune-suppressive microenvironment may contribute to the inferior prognosis of C2 compared to C1.

Next, we removed two grouping groups of m6A modification patterns using unsupervised clustering techniques. In m6A gene cluster 2, 20 m6A regulatory factors were significantly upregulated, and patients in this cluster exhibited poorer prognosis.

We used LASSO + multivariate Cox regression analysis to create m6A-related characteristics and determine a m6A score for every CRC patient based on prognostic differentially expressed genes from the two m6A gene clusters. This score demonstrated promising prognostic prediction potential and diagnostic efficiency. Patients were split into groups based on their m6A score: high and low, with the median serving as the cutoff. Tumor mutation burden and MSI-H were increased in patients with elevated m6A scores. High levels of microsatellite instability result from the mismatch repair (MMR) system’s inability to address replication errors in short repetitive DNA sequences (MSI-H). The MSI-H phenotype in colorectal cancer is linked to tumor proximity, a high density of infiltration by local lymphocytes, and a low rate of distant organ metastasis. Additionally, the prognosis for MSI-H colorectal cancer is better than that of MSS (microsatellite-stable, or MSS) cancer. However, other studies indicate that patients with MSI-H colorectal cancer respond less favorably to chemotherapy, potentially because of a higher rate of mutation that leads to the emergence of clones resistant to the drug. In the era of immunotherapy, MMR (MMRd) defects are increasingly recognized as hypermutators that contribute to enhanced antitumor immune responses, possibly augmented through checkpoint inhibition. Importantly, frameshift mutations generated in MMRd tumors result in alterations of the entire coding sequence downstream of the mutation site, thereby eliciting greater immunogenicity, whereas point mutations only potentially generate novel antigens at the mutation site. Consequently, rather than their total tumor mutation burden, it is hypothesized that the high frameshift mutation rate of MMRd tumors may be more responsible for the checkpoint inhibitors’ notable pan-cancer efficacy [37]. Subsequent immune-related analyses corroborated this observation. Moreover, the types and abundance of immune cell infiltration varied across different m6A score levels, possibly contributing to differential responses to ICB. Therefore, further investigation into the differences in the immune microenvironment under different m6A modification levels is warranted to design personalized treatment strategies for optimal ICB efficacy.

Our m6A score also predicts patients’ sensitivity to chemotherapy. Patients with lower m6A scores may be preferred candidates for treatments such as oxaliplatin, paclitaxel, docetaxel, and cisplatin. Our scoring system provides guidance for personalized chemotherapy regimen selection. More extensive investigations with larger sample sizes are necessary to further validate our findings. Additionally, patients with higher m6A scores exhibited significantly lower mRNA stemness index (mRNAsi), indicating higher differentiation levels and relatively lower malignancy. Thus, our m6A score holds great potential in predicting tumor malignancy and guiding treatment strategies.

Our study still has certain limitations. Although our sample size is relatively large, there is still a gap between the sample size needed for application to the entire population. Furthermore, we lack certain real-world relevant studies. In the future, we will focus on addressing the issues identified in this study and further demonstrate the value of our m6A score in CRC. Collaborating with multiple medical centers to conduct multicenter, large-sample studies will further enhance the reliability and generalizability of our conclusions.