Background

Breast cancer is the most prevalent cancer worldwide [1, 2]. Epidemiological and pathological studies have shown that breast tumorigenesis is a stepwise process, staging from usual ductal hyperplasia (UDH) to atypical ductal/lobular hyperplasia (ADH/ALH), ductal/lobular carcinoma in situ (DCIS/LCIS) and invasive ductal breast cancer (IDC) [3,4,5,6]. In general, patients diagnosed with ADH/ALH or LCIS have a 4–10 fold increased risk for IDC [7,8,9]. Although advancements in imaging techniques have improved the diagnosis of breast tumors, there is still a need for efficient biomarkers to distinguish the tumors with elevated risk of malignant transformation [10]. Therefore, identifying the molecular determinants underlying the progression of breast precancerous lesions holds immense promise for the development of early diagnostic and predictive biomarkers.

The maintenance of dynamic equilibrium between DNA methylation and demethylation is crucial in numerous physiological processes [11, 12]. In contrast, extensive studies have demonstrated that cancer cells exhibit aberrant DNA methylation and 5-hydroxymethylation patterns [13,14,15,16,17]. To date, both 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) signatures have emerged as pivotal epigenetic regulators in cancer initiation and progression. Intriguingly, in addition to primary tumors, mounting evidence suggests that 5hmC in cell-free DNA (cfDNA) is tissue-specific and holds promise as a biomarker for cancer detection [18,19,20,21].

Previously, epigenetic modulations, encompassing DNA methylation, have been recognized as pivotal contributors to breast tumorigenesis [22]. Recent investigations have expanded beyond the global changes of DNA 5hmC in breast cancer to explore the genomic landscape of 5hmC in breast tissues and breast cancers (DCIS/IDC) [23, 24]. Notably, breast cancer and precursor lesions exhibit a significant reduction in the abundance of 5hmC compared to normal tissues [25,26,27,28]. Moreover, genomic 5hmC exhibits dynamic changes in various stages of lymph node metastasis in breast cancer [29]. However, although the involvement of DNA 5hmC and 5mC in breast tumors has been widely demonstrated [30], it remains unclear whether dysregulated DNA 5hmC and 5mC are involved in the progression of breast precancerous lesions. Furthermore, the majority of methylated loci identified in previous reports are primarily based on methylation-specific PCR [31,32,33] and Human Methylation450 microarray [34,35,36,37]. Thus, our understanding of the genome-wide characteristics of DNA 5mC and 5hmC in breast precancerous lesions and the role of active DNA demethylation in breast tumorigenesis is still lagging behind.

In this study, we comprehensively analyzed the dynamic landscapes of DNA 5hmC, spanning from early precancerous lesions to malignant tumors, and investigated the intricate crosstalk between 5hmC and 5mC in breast precancerous lesions. Furthermore, through in silico analyses, we unveiled the potential role of TET2 in collaboration with transcription factors (TFs) in influencing dynamic DNA demethylation and propelling breast tumorigenesis. Finally, we identified hydroxymethylated regions in cfDNA that hold promise for application in breast cancer screening.

Methods

Human breast tumor samples’ collection

Samples of precancerous lesions including usual ductal hyperplasia (UDH), atypical ductal hyperplasia (ADH), and ductal carcinoma in situ (DCIS) as well as invasive ductal breast cancer (IDC) were obtained from the patients who received surgery in the First Affiliated Hospital of China Medical University (Table 1). Detailed information of clinical samples used for hMeDIP-seq/MeDIP-seq/RNA-seq in this study were listed in Table 2.

Table 1 Samples information and number of clinical samples used in this study
Table 2 Information of patient characteristics that used for genomic sequencing

Immunohistochemical staining analysis

4-μm paraffin-embedded sections were employed for immunohistochemical staining (IHC). After de-paraffinization and antigen retrieval, tissue sections were incubated with primary antibodies at 4 °C overnight and secondary antibodies at 37 °C for 2 h. After that, all the slides were counterstained with hematoxylin. The antibodies used in this study include anti-5hmC (Active motif, 39769, 1:2000), and ImmPRESSTM horse anti-Rabbit IgG (Vector, MP-7401).

All the images were acquired using the TissueFAXS cell analysis system (TissueGnostics, Austria). In each slide, we randomly selected more than three regions (> 1 mm2) that were enriched with ducts for quantitative analysis. According to the degree of cell aggregation and the length/width ratio of each cell, the nuclei of luminal cells and cancer cell in each duct were identified and marked. DAB staining intensity of each cell was measured by using HistoQuest software. The score value in each region was calculated as the average staining intensity in all the selected cells. The mean score value of all the selected cells was calculated and recorded as relative level of DNA 5hmC in each sample.

Histological identification and manual macro-dissection

All the clinical tumor samples were collected after pathological diagnosis. Fresh frozen samples were sliced to 8-μm tissue sections firstly and then subjected to hematoxylin and eosin staining (H & E). Histological identification was performed for each slide under two pathologists’ screen separately. Ultimately, tumor lesions diagnosed with definite UDH, ADH, DCIS and IDC without too much infiltrating lymphocytes (roughly < 10%) were subjected for hMeDIP-seq/MeDIP-seq. For macro-dissection, samples with 8-μm slides were used for pathological identification, while continuously adjacent 30-μm fresh frozen tissue sections were subjected for H & E staining and manual macro-dissection using stereomicroscope [29]. Macro-dissected ducts/cells were collected and then used for DNA and RNA extraction. Additionally, regarding the samples used for RNA extraction, all the reagents and consumables were pretreated with RNase removal, and all the procedures were performed at 4 °C.

LC–MS/MS analysis

The 5hmC and 5mC content in cells were quantified by the LC–MS/MS as described previously [38]. Briefly, genomic DNA obtained from 48 samples were firstly digested into single nucleosides with DNA Degradase Plus (Zymo Research, E2021). Subsequently, the nucleosides and labeled products were analyzed with Thermo Scientific Dionex Ultimate 3000 HPLC coupled with a Triple Quad™ 5500 mass spectrometer with an ESI source.

DNA extraction and hMeDIP-seq/MeDIP-seq

A total of 3.0 μg of intact genomic DNA per sample was used for Hydroxymethylated/Methylated DNA immunoprecipitation sequencing (hMeDIP/ MeDIP-seq). 1.5 μg of genomic DNA mixed with 5hmC or 5mC spike-in DNA control (5hmC: ZYMO Research, D5405-3, 1:20000; 5mC: Wise gene, S001, 1:200) were fragmented (100–250 bp) and ligated with Illumina barcode adapter. In this study, input DNA was used as a control to determine the enrichment ratio of 5hmC/5mC-modified DNA. Immunoprecipitation reaction was performed by mixing DNA with 5hmC/5mC antibody (5hmC: Active motif, 39769; 5mC: Abcam, ab10805) and protein A/G beads for 2 h. The immunoprecipitated 5hmC/5mC-containing DNA fragments were purified using QIAGEN Mini Elute PCR purification kit (QIAGEN, 28004). All the immunoprecipitated products and input DNA were subjected to amplification, size selection (275–475 bp), purification (QIAGEN, 28704) and quality control test. All the samples (UDH: 3 cases, ADH: 3 cases, DCIS: 3 cases) were subjected to next-generation sequencing on Illumina Hiseq X-Ten system. The hMeDIP-seq data of early-stage invasive ductal breast cancer were described in our previous report (GSA: CRA001593) [29].

Reads mapping

First, raw reads were processed with Trimmomatic (Version 0.33) to remove sequencing adaptors and low-quality bases by using default parameters [39]. The clean reads were mapped to hg19 genome by Bowtie2 (Version 2.3.2) with default parameters [40]. Then Samtools (Version 1.9) [41] were used to remove duplicated and unpaired reads.

Peak calling and annotation

Whole-genome scanning of hydroxymethylated/methylated region (hMR/MR) was conducted by using MACS2 (Version 2.1.1) [42]. Differentially hydroxymethylated/methylated regions (DhMR/DMR) were identified using Diffbind (Version 3.8) package in R with the parameters P value < 0.05 and ❘Log2(foldchange)❘ > 1 [43, 44]. To determine hydroxymethylated/methylated genes (hMG/MG) and differentially hydroxymethylated/methylated genes (DhMG/DMG), hMRs, MRs, DhMRs and DMRs were annotated to genomic regions and corresponding genes with ChIPseeker R package (Version 1.36.0) [45].

Continually hyper/hypo-methylated peaks during tumorigenesis

We first implemented differential hydroxymethylation analysis upon samples between adjacent stages. Regions showing continual 5hmC accumulation (P value < 0.05 and ❘Log2(foldchange)❘ > 1) were identified as continually hyper-hydroxymethylated regions, with the same criteria being applied for hypo-hydroxymethylation. For the sake of visualization, we averaged the signal values in biological replicates of each stage and normalized across all phases with the highest value as 1.

KEGG pathway enrichment and GESA analyses

KEGG pathway enrichment analysis of the selected genes was carried out using the DAVID tool [39]. The cut-off value of FDR value for the significantly enriched pathways was 0.05. Meanwhile, Breast cancer related gene sets were selected from MSigDB and were analyzed through Gene set enrichment analyses (GSEA), the criteria were NES > 1, Q value < 0.05.

RNA extraction and RNA-seq

Total RNA was extracted from paired macro-dissected samples of hMeDIP-seq/MeDIP-seq using TriReagent (Sigma, T92424). Truseq RNA library preparation was used for UDH1, UDH2, DCIS1, and DCIS2 samples; Ribo-Minus RNA library preparation was used for ADH2 and ADH3 samples. Subsequently, RNA sequencing on Illumina Hiseq X-Ten system were performed as described previously [40].

Analysis of RNA-seq data

For the comparison of RNA abundance between adjacent stages during breast tumorigenesis, we downloaded RNA-seq data from the GEO database (GSE47462) [42]. Normalized read counts were input to the Limma package (Version 3.56.2) [44] for differential expression analysis.

To further explore the effect of DNA epigenetic modifications on transcriptional regulation, the raw data of RNA-seq with part of our own paired samples were firstly subjected to QC analyses with the FastQC tool, and then mapped to hg19 genome by bowtie2 (Version 2.3.2) with default parameters. The expression level of each RNA was quantified with Fragments Per Kilobase of transcript per Million mapped reads (FPKM).

Correlation analysis between RNA expression and DNA modification level

To identify the correlation between RNA expression and DNA methylation/ hydroxymethylation levels, we divided RNAs into three equally-sized groups according to the tri-sectional quantiles of their expression levels in every single sample, and then made the average plots of all 5hmC/5mC peaks along the DNAs encoding of RNAs in each group.

ChIP-seq analysis of TET2, histone marks, and TFs

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) data of TET2 were obtained from GEO database (GSE153251) [46], and genomic coordinates were transferred from hg38 to hg19 version with UCSC liftover. To obtain the complete TET2-binding genomic regions, we integrated ChIP-seq data from three experiments conducting ChIP-seq analyses on MCF7 cells without any perturbation. DhMRs/DMRs that overlapped with TET2-binding regions were identified as TET2-binding DhMR/DMRs. ChIP-seq data of TFs (ESR1, GATA3, FOXA1, FOS, FOSL2, FOXM1, JUNB) and histone marks (H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, H3K9me3) were downloaded from ENCODE [47] in.bigwig format, and then analyzed with Deeptools (Version 3.5.1).

Motif analysis

The motif of TET2-binding DhMRs/DMRs in each phase was identified with Homer (Version 4.11.1) [48] with default parameters. For visualizing the binding motifs of TET2-binding regions overlapped with TFs, the P values from motif enrichment analysis were used in the heatmap.

Visualization

The average read counts per million distribution of the gene were displayed from 3 kb upstream of transcription start sites (TSSs) to 3 kb downstream of transcription ending sites (TESs) using Deeptools [49]. Clustering and heatmap plotting of hMR was conducted by Pheatmap package (Version 1.0.10) in R. KEGG enrichment plots, volcano plots, and boxplots in this paper were all made with Ggplot2 (Version 3.1.0) [50] R package.

Regarding the average plots showing the enrichment of different proteins (histone marks and transcriptional factors) flanking specific genomic regions (like DhRMs/DMRs), we used Multibigwig summary function in Deeptools to first summarize the enrichment of binding signals of the corresponding protein in equally binned regions and then plot the enrichment value in each bin.

DhMRs analysis of cfDNA

Short-gun sequencing of 5-hydroxymethylated cfDNA from blood samples of both healthy controls and breast cancer patients were downloaded from GEO database (GSE81314) [50]. The DhMRs in cfDNA were identified under the same criteria as that in primary tumor samples. The overlap of DhMRs in cfDNA and primary tumors were quantified by the FindOverlaps function in the GenomicRanges R package [51].

Statistical analysis

Unpaired student’s t-test analysis was applied for statistical analysis in Fig. 1c, 1e, 2b, 3a, g, h, 5a, b, S1c, S2c, S4c, S5a, S5b. Paired student’s t-test analysis was applied for statistical analysis in Fig. 1d. All the data were presented as mean ± SEM, P < 0.05 was set as statistically significant.

Fig. 1
figure 1

DNA 5hmC modification exhibits dynamic change in the process of breast tumorigenesis. a Representative images of H & E staining and DNA 5hmC immunohistochemical staining (IHC) in UDH, ADH, DCIS, and IDC tissues. b Representative images showing the ducts of breast precancerous lesions (UDH, ADH, DCIS, and IDC) identified for quantitative analyses of 5hmC staining. The nuclear epithelial cells in the targeted ducts were denoted in red circles. c Quantitative comparison of DNA 5hmC levels based on IHC images (a) across UDH, ADH, DCIS, and IDC samples. The numbers of samples in each group were marked in the parentheses. d Pairwise comparison of DNA 5hmC levels between UDH and ADH, ADH and DCIS, respectively. e Quantification of 5hmC levels obtained by LC–MS/MS analysis of each sample in DCIS and IDC groups. Scale bars represent 50 μm and 100 μm. P values were calculated by using unpaired t-test in c and e; paired t-test was used in d. *P < 0.05, ****P < 0.0001, ns: no significant. UDH, usual ductal hyperplasia; ADH, atypical ductal hyperplasia; DCIS, ductal carcinoma in situ; IDC, invasive ductal breast cancer

Fig. 2
figure 2

Genome-wide characteristics and dynamic changes of DNA 5hmC in different stages of breast tumors. a Process of breast tumor’s macro-dissection. b Boxplots showing relative DNA 5hmC levels of UDH, ADH, DCIS, and IDC samples. c Volcano plots highlighting DhMRs (P value < 0.05 and ❘Log2(foldchange)❘ > 1, hyper: red; hypo: blue) in each phase of breast tumorigenesis. d Distribution of hyper-DhMRs and hypo-DhMRs across genomic regions in each phase of breast tumorigenesis. e Distribution of DhMRs in TSS-surrounding regions (TSS ± 3 kb), the red line indicates hyper-DhMRs in the later stage, and the blue line indicates hypo-DhMRs in the later stage. f Numbers of genes possessing DhMRs (DhMGs) in each phase of breast tumorigenesis. g Heatmap showing the regions that exhibit continual increase of 5hmC from UDH to IDC. h KEGG enrichment analysis of the genes accompanied with continual hyper-DhMRs. i Heatmap showing the regions that exhibit continual decrease of 5hmC from UDH to IDC. j KEGG enrichment analysis of the genes accompanied with continual hypo-DhMRs. ***P < 0.001. DhMRs: differentially hydroxymethylated regions; TSS: transcription starting site

Fig. 3
figure 3

Dynamic changes of genomic 5mC along with 5hmC in the early phase of breast tumorigenesis. a Boxplots showing relative DNA 5mC levels of UDH, ADH, and DCIS samples. b Volcano plots displaying DMRs (P value < 0.05, and ❘Log2(foldchange)❘ > 1) identified in the early stages of breast precancerous lesions. c Distribution of hyper-DMRs and hypo-DMRs across genomic regions in the early-stage breast precancerous lesions. d Distribution of DMRs in TSS-surrounding regions (TSS ± 3 kb), the red lines indicate hyper-DMRs in the later stage, and the blue lines indicate hypo-DMRs in the later stage. e, f Enrichment of DMRs around DhMRs (left), and the enrichment of DhMRs around DMRs (right) in phase I (e) and phase II (f). g Comparison of the degree of 5hmC changes between DMRs and non-DMRs. h Comparison of the degrees of 5mC changes between DhMRs and non-DhMRs. ***P < 0.001, ****P < 0.0001, ns: no significant. DMRs: differentially methylated regions; DhMRs: differentially hydroxymethylated regions; TSS: transcription starting site

Results

Dynamic changes of DNA 5-hydroxymethylcytosine across different stages of breast precancerous lesions

To investigate the level of DNA 5hmC in breast tumors at different stages, we conducted immunohistochemical staining analysis of 5hmC using samples from patients with UDH, ADH, DCIS, and early-stage IDC (T1N0M0) (Fig. 1a). As breast precursor lesions primarily arise from the luminal epithelial compartment of Terminal Duct Lobular Units (TDLUs), we focused on breast ductal epithelial cells to assess the staining intensity of 5hmC (Fig. 1b). A consistent and significant decrease in 5hmC levels was observed as lesions progressed from UDH to ADH and subsequently to DCIS. However, there was a modest increase in 5hmC levels from DCIS to IDC. It’s worth noting that this trend held across various patient samples (Fig. 1c). Interestingly, when comparing samples of UDH, ADH, and DCIS within the same histological sections, we found a marked reduction of 5hmC abundance in the tumor cells of advanced stage (Fig. 1d). To validate these findings, we performed a quantitative analysis of 5hmC level using liquid chromatography-mass spectrometry (LC–MS/MS), and confirmed an increasing trend in 5hmC levels from DCIS to IDC (Fig. 1e). Taken together, the dynamic changes of 5hmC in different stages of breast tumors imply that DNA 5hmC is involved in breast tumorigenesis and may have a crucial role in the transformation of precancerous lesions to invasive cancers.

Genome-wide reprogramming of DNA 5-hydroxymethylcytosine during breast tumorigenesis

To examine the dynamic alterations in 5hmC associated with the progression of breast tumorigenesis at the genome-wide level, four distinct types of breast tumors (UDH, ADH, DCIS, and IDC) were utilized for hMeDIP-seq analysis (Tables 1, 2 and 3). Considering that breast cancer is a heterogeneous disease and mainly originates from epithelium, we specifically conducted macro-dissection to obtain abnormal ducts and cancer cells for 5hmC analysis (Fig. 2a and Additional file 1).

Table 3 Information of the samples used for genomic profiling analyses of DNA 5hmC, 5mC, and RNA-seq

The genome-wide profiling revealed that 5hmC peaks were predominantly distributed in intronic and distal intergenic regions across all stages. However, as lesions progressed from UDH to DCIS, a gradual reduction in the proportion of 5hmC peaks situated in promoters, exons, 5′UTRs, and 3′UTRs was observed, followed by an increase in IDC (Additional file 2: Fig. S1a). Furthermore, a marked enrichment of 5hmC peaks was observed in the vicinity of transcription start sites (TSSs) in both UDH and IDC samples. In contrast, 5hmC modifications were evenly distributed across gene bodies in ADH and DCIS samples (Additional file 2: Fig. S1b). In terms of modification levels, a steady decline in 5hmC levels was observed from UDH to DCIS, followed by an increase from DCIS to IDC (Fig. 2b), which is consistent with our previous observations. Furthermore, similar trends in the alteration of 5hmC levels were observed in specific genomic regions across these four stages (Additional file 2: Fig. S1c).

Building upon these dynamic 5hmC changes, pairwise comparisons between adjacent stages were performed to identify differentially hydroxymethylated regions (DhMRs, P value < 0.05 and |Log2(foldchange)| > 1) in the three phases (Fig. 2c, Table 4, and Additional file 3: Table S3). Regarding genomic distribution, an increase in the proportion of hyper-DhMRs in promoters, exons, 5′UTRs, and 3′UTRs was observed from phase I to phase III, whereas the proportion of hypo-DhMRs in these regions decreased (Fig. 2d). Moreover, hypo-DhMRs in phase I and hyper-DhMRs in phases II–III were significantly enriched around TSS, while hypo-DhMRs in phases II–III displayed a bimodal distribution near TSS (Fig. 2e). Subsequently, it was observed that the number of genes harboring hyper-DhMRs (hyper-DhMGs) increased during breast tumorigenesis, while the number of genes harboring hypo-DhMGs decreased (Fig. 2f). KEGG enrichment analysis and GSEA analysis of DhMGs revealed that both hyper- and hypo-DhMGs were closely associated with cancer-related pathways (Additional file 2: Fig. S1d–S1e and Table S1). Notably, 2307 genes harboring 2938 DhMRs exhibited continual 5hmC gain (Fig. 2g), and these genes were enriched in pathways such as cancer, RAS, RAP1, MAPK signaling pathways, and axon guidance (Fig. 2h). Conversely, 2036 DhMRs, annotated to 1501 genes, exhibited continual 5hmC loss (Fig. 2i). The corresponding hypo-DhMGs were significantly enriched in cancer-related pathways, including cell cytoskeleton, cell adhesion, and Hippo signaling pathway (Fig. 2j). These comprehensive genome-wide profiling analyses of breast lesions have unveiled the dynamic changes in DNA 5hmC throughout the four stages of breast tumorigenesis. The distribution of DhMRs around TSS suggests their potential roles in transcriptional regulation. Additionally, genes exhibiting altered 5hmC modifications may play a crucial role in the development of early-stage breast cancer, particularly those associated with continual 5hmC changes. These findings underscore the significance of 5hmC as a potential epigenetic regulator in breast tumorigenesis.

Table 4 Numbers of DhMRs and DMRs in each phase of breast tumorigenesis

Coincidence of dynamic 5-hydroxymethylcytosine with active DNA demethylation in the early stage of breast precancerous lesions

In addition to the role of 5hmC, 5mC has also been recognized as a significant contributor to cancer progression, including breast cancer [52]. However, the mechanism by which DNA 5hmC and 5mC orchestrate in different phases of breast tumorigenesis to promote tumor progression remains unclear. Recent microarray-based analyses have indicated that in spite of limited changes in 5mC between DCIS and IDC, there are more differentially methylated genes (DMGs) between normal breast tissue and DCIS [34]. Consequently, we conducted genome-wide 5mC profiling analyses of UDH, ADH, and DCIS (Tables 1, 2 and 3) to delve into the dynamic changes of DNA 5mC during breast tumorigenesis.

Unlike 5hmC (Additional file 2: Fig. S1a), the percentage of 5mC peaks located in each regulatory elements remained consistent across different stages of breast tumors (Additional file 2: Fig. S2a). Besides, the genomic pattern of 5mC was similar among all samples, characterized by a depletion around TSS (Additional file 2: Fig. S2b). In addition, it was found that the global level of 5mC firstly increased from UDH to ADH, followed by a decrease from ADH to DCIS (Fig. 3a). The trend of global 5mC levels was mirrored in the promoter, downstream, intergenic, and intron regions (Additional file 2: Fig. S2c). In contrast, the 5mC levels in the exon, 5′UTR, and 3′UTR regions were higher in UDH and DCIS compared to ADH (Additional file 2: Fig. S2c).

Subsequently, differentially methylated regions (DMRs) that were either reinforced (hyper-DMRs) or diminished (hypo-DMRs) as the lesions progressed, were identified by comparative analysis of 5mC profiles between adjacent stages. Notably, 11,507 hypo-DMRs and 157 hyper-DMRs were identified in phase I, as well as 5,112 hypo-DMRs and 14,112 hyper-DMRs in phase II (Fig. 3b, Table 4, and Additional file 4: Table S4). Concurrently, an increase in the percentage of DMRs (both hyper-DMRs and hypo-DMRs) located in promoters was observed in phase II compared to phase I (Fig. 3c). Interestingly, both hyper- and hypo-DMRs in phase II exhibited a preference for enrichment around TSS, whereas such feature was not observed in phase I (Fig. 3d). Furthermore, KEGG and GSEA analyses revealed that the genes harboring either hypo- or hyper-DMRs were enriched in pathways closely associated with cancer progression (Additional file 2: Fig. S2d–S2f and Table S2).

Given that 5hmC is established through the oxidation of 5mC, we further investigated the correlation between 5hmC- and 5mC-modified genes during the aforementioned phases based on pairwise comparisons of DhMRs and DMRs in each phase. Considering the enrichment of DhMRs and DMRs around TSS, their positional relations were initially explored. In phase I, the majority of DMRs were enriched in the regions centered around DhMRs, while DhMRs were evenly distributed from DMRs to their downstream regions (Fig. 3e). Conversely, in phase II, it was observed that most DMRs were centered around the DhMRs, and vice versa (Fig. 3f). Subsequently, 332 and 1,269 regions displaying significant changes in both 5hmC and 5mC simultaneously were identified (P value < 0.05 and |Log2(foldchange)| > 1), respectively (Additional file 2: Fig. S2g). Of note, it was observed that the changes in 5hmC within DMRs were more prominent compared to those outside DMRs (Fig. 3g). In contrast, the changes in 5mC across DhMRs were comparable to those outside DhMRs (Fig. 3h). These results suggest that DhMRs and DMRs overlap each other during breast tumorigenesis. Additionally, the effect of 5mC changes on DhMR patterns appears to be more pronounced than 5hmC changes on DMR patterns, shedding light on the complex interplay between 5hmC- and 5mC-modified genes in breast tumorigenesis.

Synchronization of active DNA demethylation with active histone modifications to be involved in transcriptional regulation

Both DNA methylation and hydroxymethylation are pivotal transcriptional regulators with profound effects on gene expression. To investigate the impact of dynamic changes of 5hmC and 5mC on RNA expression during breast tumorigenesis, a comprehensive set of publicly available RNA-seq data from breast tumors (GSE47462) [42] was re-analyzed, which represents various stages of breast tumorigenesis (Additional file 2: Fig. S3a). As shown in Fig. 4a, a substantial number of differentially expressed genes (DEGs) concurrent differential 5hmC or 5mC levels. These DhMRs/DMRs-associated DEGs are implicated in the development of cancer (Additional file 2: Fig. S3b–S3c).

Fig. 4
figure 4

DNA epigenetic modifications are associated with gene expression and specific histone modifications’ enrichment. a Venn diagram showing the overlap between DEGs and DhMGs/DMGs in each phase of breast tumorigenesis. b, c. Distribution profiles of 5hmC (b) and 5mC (c) peak in genes expressed at high (red), medium (yellow) and low (blue) levels. d, e. Enrichment of H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, and H3K9me3 signals flanking hypo-DhMRs (d) and hyper-DhMRs (e) in each phase of breast tumorigenesis. f, g. Enrichment of H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, and H3K9me3 signals around hypo-DMRs (f) and hyper-DMRs (g) in early phases of breast tumorigenesis. DEG: differentially expressed genes; DhMG: differentially hydroxymethylated gene; DMG: differentially methylated gene

To gain deeper insights into the effects of DNA 5hmC and 5mC on RNA expression within breast tumors, RNA-seq analyses were performed on the same samples previously subjected to hMeDIP-seq and MeDIP-seq (Tables 1, 2 and 3). The genes were categorized into three evenly sized groups based on the trisectional quantiles of their RNA expression levels, and the distribution of 5hmC and 5mC among these genes in each group was assessed. As a result, a positive correlation was observed between RNA expression levels and the frequencies of 5hmC in TSS and upstream regions (Fig. 4b and Additional file 2: Fig. S3d). Conversely, 5mC levels near TSS exhibited a negative regulatory effect on RNA expression (Fig. 4c and Additional file 2: Fig. S3e).

On the basis of the multifaceted effects of 5hmC and 5mC on transcriptional regulation and the crosstalk between DNA modifications and histone marks, we next investigated whether histone marks play a role in the regulatory effects of DNA 5hmC and 5mC. The distribution of common histone marks, including H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, and H3K9me3, around the corresponding DhMRs and DMRs was explored utilizing ChIP-seq data of MCF7 cells from the ENCODE database [53]. This analysis unveiled the correlations between DhMRs/DMRs and histone modifications occurring in distinct phases of breast tumorigenesis. Specifically, hypo-DhMRs in phase I and hyper-DhMRs in phase III exhibited enrichment of the active histone marks H3K27ac and H3K4me3, which are frequently located at active promoters and associated with active transcription (Fig. 4d–4e). In parallel, hypo-DMRs in II appeared to show a slight enrichment of enhancer-specific histone modifications H3K27ac and H3K4me1 (Fig. 4f–4g). Overall, both DhMRs and DMRs exhibited specific associations with several active histone marks, rather than repressive ones, such as H3K27ac and H3K4me3. Moreover, these associations were phase-specific, suggesting a dynamic interplay between DNA modifications and histone marks that contributes to the progression of breast tumorigenesis. These results uncover the significance of 5hmC and 5mC enrichment around TSSs in influencing gene transcription, possibly through crosstalk with histone marks. Such regulatory mechanism is of great significance for promoting breast tumorigenesis.

Cooperation of TET2 with ER complex and FOS to function in genomic repatterning of 5-hydroxymethylation and 5-methylation

Conversion of 5mC to 5hmC relies on the DNA demethylation enzyme TETs. As previously reported, the expression of TET2 decreases with the progression of breast precancerous lesions [28]. To investigate the effect of TET2 on DNA demethylation during breast tumorigenesis, we conducted a comprehensive analysis by integrating public and in-house data from different stages of breast lesions.

We first obtained ChIP-seq data of TET2 in MCF7 cells from the GEO database (GSE153251) [46] to identify the binding regions of TET2. A significant proportion (approximately 56.4%) of TET2 binding regions were distributed across distal intergenic and promoter regions (Additional file 2: Fig. S4a). Meanwhile, it was observed that TET2 preferred to bind active and primed enhancers and promoters marked by H3K27ac, H3K4me1, and H3K4me3. Intriguingly, TET2-binding sites around H3K27ac and H3K4me1 displayed a bimodal enrichment pattern (Additional file 2: Fig. S4b). To explore the role of TET2 in transcriptional regulation, we performed integrated analyses on RNA-seq and hMeDIP-seq data obtained from breast tumors. Our findings highlighted that both TET2-binding sites and 5hmC deposition exerted a positive effect on RNA expression, with the highest levels of RNA expression in genomic regions meeting both criteria (Fig. 5a and Additional file 2: Fig. S4c). Consequently, we proposed that TET2-related 5hmC modification significantly contributes to the transcriptional regulation in breast tumors.

Fig. 5
figure 5

TET2 co-localizes with TFs and participates in breast tumorigenesis through mediating DNA demethylation. a Density plot showing the distribution of expression level of RNAs encoded by TET2-binding genes or hydroxymethylated genes. b Boxplots showing the relative 5hmC levels of the TET2-binding hMRs (TET2-hMRs) and non-TET2-binding hMRs (nTET2-hMRs) in the UDH, ADH, DCIS, and IDC samples. c Distribution of TET2-binding hyper-DhMRs and hypo-DhMRs across genomic regions in each phase of breast tumorigenesis. d, e Enrichment of H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, and H3K9me3 around TET2-binding hypo-DhMRs (d) and TET2-binding hyper-DhMRs (e) in breast tumorigenesis. f Heatmap showing the enrichment of canonical motifs recognized by the several TFs in TET2-binding DhMRs in each phase of breast tumorigenesis. g Enrichment of the crucial transcriptional factors around TET2-binding hyper-DhMRs and hypo-DhMRs in breast tumorigenesis. **P < 0.01, ***P < 0.001. nTET2-hMGs: Hydroxymethylated genes without TET2-binding regions; nTET2-unhMGs: Genes without 5hmC modifications and TET2-binding regions; TET2-hMGs: Hydroxymethylated genes with TET2-binding regions; TET2-unhMGs: Genes with TET2 binding regions but without 5hmC modifications

To gain a deeper understanding of the role of TET2 in modulating the dynamic changes of 5hmC during breast tumorigenesis, we compared the levels of 5hmC between hMRs with and without TET2 enrichment at each stage of breast tumors (Fig. 5b). The changing pattern of 5hmC in TET2-binding regions across different stages was in agreement with the global 5hmC changes in breast tumorigenesis (Fig. 2b), and the 5hmC levels in TET2-binding regions across different stages also exhibited a continuous decline from UDH to DCIS, followed by an increasing trend. In terms of enrichment levels, 5hmC levels in TET2-binding regions were stably and significantly higher than those in non-TET2-binding regions in all stages of breast lesions (Fig. 5b). Additionally, thousands of DhMRs in all phases were recognized by TET2 (Additional file 2: Fig. S4d). The proportion of TET2-binding hyper-DhMRs in promoter, exon, and 3′UTR regions progressively increased from phase I to phase III, while hypo-DhMRs displayed an opposite trend (Fig. 5c). Based on our observations that DhMRs are preferentially located in cis-regulatory regions marked by active histone modifications (Fig. 4d–e), we further explored the enrichment of histone marks near TET2-binding DhMRs and observed a robust enrichment of active histone mark H3K27ac (Fig. 5d–e). Moreover, enhancer-specific histone mark H3K4me1 displayed a similar pattern to that of H3K27ac in TET2-binding hypo-DhMRs, albeit at a relatively low level of enrichment (Fig. 5d). Interestingly, both TET2-binding hypo-DhMRs in phase I and TET2-binding hyper-DhMRs in phase III exhibited associations with H3K4me3 (Fig. 5d–e).

Given the interplay between DhMRs and DMRs in genomic distribution and the crucial role of TET2 in DNA demethylation, we hypothesized that TET2 might also be involved in dynamic DMRs concurrently with TET2-targeted DhMRs. In our study, a less number of TET2-binding DMRs were detected compared to TET2-binding DhMRs (Additional file 2: Fig. S4e). However, the TET2-binding hypo-DMRs exhibited a similar enrichment pattern of active histone marks to TET2-binding DhMRs, such as H3K27ac and H3K4me1 (Additional file 2: Fig. S4f). Thus, co-occupation of active histone marks and TET2-binding DhMRs/DMRs in promoter and enhancer regions suggests that TET2-related 5hmC and 5mC modifications play crucial roles in transcriptional regulation via manipulating the activity of regulatory elements during breast tumorigenesis.

As TET2 is capable of coactivating crucial transcriptional factors (TFs) such as ERα (ESR1) and GATA3 in breast cancer cells, we then explored the potential TFs that involved in TET2-related DhMRs/DMRs during breast tumorigenesis. Through an examination of enriched motifs in TET2-binding DhMRs, the motifs associated with several TFs were identified, including ESR1, FOXA1, FOS, FOSL2, FOXM1, and JUNB (Fig. 5f). Similar to TET2, these TFs are typically located within active chromatin regions marked by H3K27ac, H3K4me1, and H3K4me3 (Additional file 2: Fig. S4g). To confirm the involvement of aforementioned TFs and GATA3 [46], we compared their enrichment levels surrounding DhMRs/DMRs regions at each phase of breast tumorigenesis. Our analyses revealed that ESR1, GATA3, FOXA1, and FOS exhibited pronounced enrichment around TET2-binding DhMRs in at least one phase of breast tumorigenesis (Fig. 5g and Additional file 2: Fig. S4h). As shown in Fig. 4b and 4c, both 5hmC and 5mC DNA modifications associated with transcriptional regulation were predominantly distributed around TSS, which prompts the investigation of whether these four TFs are also located around DMRs during breast tumorigenesis. In Additional file 2: Fig. S4i, a strong enrichment of these TFs was observed in TET2-binding DMRs in phase I. In contrast, only TET2-binding hypo-DMRs, rather than hyper-DMRs, were enriched with the TFs in phase II. Conversely, no similar phenomenon was observed in non-TET2-binding DMRs (Additional file 2: Fig. S4j). Therefore, we deduced that TET2-binding DhMRs and hypo-DMRs located in promoters and enhancers play a role in modulating gene expression. Additional transcriptional factors ESR1, GATA3, FOXA1 and FOS co-localize with TET2 and are likely to be involved in the dynamic changes of 5hmC and 5mC throughout breast tumorigenesis.

Identification of differentially hydroxymethylated regions as potential biomarkers for detecting early-stage breast cancer

In recent decades, liquid biopsy has attracted increasing attention as a non-invasive alternative to tissue biopsy for cancer screening and monitoring [54]. Although cfDNA 5hmC has proven its value as a biomarker for various cancers [15, 20, 55, 56], its potential application in breast cancer remains underexplored. Given the integral role of 5hmC in breast tumorigenesis, we utilized 5hmC sequencing data of cfDNA from a previous study [50] to identify common DhMRs shared between cfDNA and primary breast tumors. Such DhMRs hold promise as diagnostic markers for breast cancer screening.

Compared to healthy control samples, only a slight decrease in the global levels of 5hmC in cfDNA from patients with early-stage breast cancer was observed, while the changes in primary tumor samples were more pronounced (Additional file 2: Fig. 5a–S5b). Despite limited alterations in global 5hmC levels, 4881 hyper-DhMRs and 3570 hypo-DhMRs, annotated to 3408 and 2676 genes, respectively, were identified in cfDNA. In the meantime, 3718 hyper-DhMRs and 12,254 hypo-DhMRs, annotated to 2977 and 5468 genes, respectively, were found in breast cancers compared to benign tumors (P value < 0.05 and ❘Log2(foldchange)❘ > 1) (Fig. 6a–6b). DhMRs in cfDNA and breast tissues were predominantly enriched in intronic and intergenic regions, followed by promoter regions (Additional file 2: Fig. S5c–S5d). Subsequent KEGG analyses revealed that DhMGs in cfDNA exhibited a strong enrichment in cancer-related pathways, such as PI3K-Akt, RAS-RAP1, MAPK, and cell adhesion (Fig. 6c). In addition, DhMGs in breast cancers were significantly enriched in cancer-related pathways (Fig. 6d). To further explore the 5hmC signals that can potentially be used for early detection of breast cancer, 146 hypo-DhMRs and 12 hyper-DhMRs were identified, which exhibited concurrent changes in cfDNA and breast cancer tissues (Fig. 6e, f). Among the DhMRs-annotated genes, KLF15, PTPRG, PPARGC1B [57], and ZFHX3 are closely related to estrogen signaling pathways and the development of mammary epithelial cells. Meanwhile, UNC5A [58], PIK3AP1, IGF1R, and HIF1A [59] have been proved to be crucial in the development and metastasis of breast cancer. These findings indicate that 5hmC signals in cfDNA may reflect the genomic characteristics of the primary breast cancers, and may become valuable candidates for early-stage breast cancer screening.

Fig. 6
figure 6

Identification of potential 5-hydroxymethylcytosine signatures in cfDNA and early-stage primary breast cancers. a Volcano plots displaying hypo-DhMRs and hyper-DhMRs identified in cfDNA (left) and breast cancer tissues (right) compared to healthy controls. b The number of genes marked with hyper-DhMRs and hypo-DhMRs in cfDNA and breast cancer tissues. c KEGG enrichment analysis of hyper-DhMGs and hypo-DhMGs in cfDNA of breast cancer patients. d KEGG enrichment analysis of hyper-DhMGs and hypo-DhMGs in breast cancer tissues. e, f Heatmap showing the 5hmC levels of 146 regions exhibiting 5hmC loss (e) and 12 regions exhibiting 5hmC gain (f) in both cfDNA and breast tissues. Healthy: healthy controls; Cancer: patients with breast cancer

Discussion

In this study, we comprehensively analyzed the genome-wide distribution of 5hmC and 5mC in different stages of breast tumors and the intricate crosstalk among 5hmC, 5mC, and histone modifications in transcriptional regulation. We also uncovered the pivotal role of TET2 in mediating the dynamic changes of 5hmC and 5mC throughout breast tumorigenesis, and identified key transcription factors (ESR1, GATA3, FOXA1, and FOS) that collaborate with TET2 in orchestrating transcriptional regulation in breast tumors. Furthermore, we identified synchronous DhMRs in cfDNA and primary breast tissues, which hold potential as liquid biopsy biomarkers for breast cancer screening.

Despite the considerable reduction in DNA 5hmC observed in various cancers, the exact dynamic changes of 5hmC throughout breast tumorigenesis remain elusive. Our results revealed a gradual decreasing trend in 5hmC levels from UDH to DCIS. However, contrary to previous study [28], we observed an upward trend from DCIS to IDC, as evidenced by IHC and LC–MS/MS analysis. Consistently, our hMeDIP-seq analysis showed that the number of hyper-DhMR/DhMG was twice as high as the number of hypo-DhMR/DhMG in phase III. Considering that those DhMGs are closely related to the adhesion and invasion of cancer cells, we deduce that the cause of this intricate change in 5hmC in phase III is to meet the need for tumor progression from DCIS to IDC. However, the determining factors orchestrating the dynamic changes of 5hmC remains to be investigated.

It has been reported that breast tumor is a highly heterogeneous disease, including both intra- and inter-tumor heterogeneity. Concretely, there are significant variations in 5hmC levels not only among epithelial cells, mesenchymal cells, and infiltrating lymphocytes (intra-tumor heterogeneity), but also among different tumors (inter-tumor heterogeneity). As breast precursor lesions primarily arise from the luminal epithelial compartment of TDLUs, here the focus was exclusively on the tumorigenesis of the luminal epithelial cells. Therefore, purely abnormal ducts and cancer cells were meticulously collected for genome-wide and transcriptome-wide analyses using macro-dissection exclusively. Moreover, to reduce the inter-tumor heterogeneity of breast cancers of different molecular subtypes, we only selected DCIS and IDC samples of luminal subtype, which account for two-thirds of primary breast cancers. Consistently, all external data (ChIP-seq of histone marks, TET2, and transcriptional factors) analyzed in this study were generated using MCF7 cell line, which also belongs to the luminal subtype. Therefore, besides our findings about 5hmC in the breast cancer of luminal subtype, the role of 5hmC in breast cancers of other molecular subtypes remains elusive.

The genome-wide profiling revealed a broad reduction of both 5hmC and 5mC in the initial phases of breast tumorigenesis. Notably, DhMRs and DMRs were frequently situated around TSS and overlapped each other in phase I and II, underscoring the significance of active DNA demethylation in the early phases of breast tumorigenesis. In our study, changes in 5mC were more subdued than those in 5hmC throughout breast tumorigenesis, and the correlation between histone marks and DhMRs was notably stronger than that observed with DMRs, which lead to the speculation that 5hmC plays a more prominent role in the progression of breast cancer.

Intriguingly, a close association was observed between DhMRs and histone modifications at active enhancers and promoters other than gene bodies. These connections suggest that 5hmC located in these regulatory regions may influence the enrichment of the corresponding chromatin marks. Though the link between 5mC and transcriptional silencing is well established, no correlation between DMRs and repressive histone marks such as H3K27me3 and H3K9me3 was observed in breast lesions, as previously identified in other tissues [60, 61]. Similar to DhMRs, DMRs in our study were also located in active chromatin regions, but with less pronounced enrichment. These observations hint that the repressive impact of 5mC on transcription may involve its collaboration with active histone marks or other repressive histone marks (except for H3K27me3 and H3K9me3) within breast tissue. In alignment with active histone marks, the abundance of 5hmC and 5mC around TSS showed a strong correlation with gene transcription levels, as described in previous studies [62, 63]. Collectively, our findings suggest that during breast tumorigenesis, both 5hmC and 5mC play a pivotal role in transcriptional regulation by coordinating with active histone marks in regulatory regions.

In the present study, we found that among all DhMRs/DMRs, active histone marks tended to be enriched in the TET2-binding DhMRs/DMRs, particularly in the promoter regions, indicative of co-localization of various TFs and TET2. Consistent with previous reports [46], ESR1 and GATA3 were observed to be involved in TET2-binding DhMRs. Furthermore, FOXA1, the pioneer factor for ESR1, was enriched in TET2-binding DhMRs. Although the interaction between FOXA1 and TET2 in prostate cancer has been reported, the role of FOXA1 in TET2-related DhMRs in breast cancer remains to be investigated. Given these observations, we speculated that the interaction of the ER complex with TET2 is vital in breast tissue. Additionally, in a study by Broome et al. [46], no global 5mC change was observed in TET2-knock down breast cancer cell lines. However, thousands of TET2-binding DMRs were identified through our comprehensive analysis of profiling data from clinical samples. Furthermore, the enrichment of ESR1, GATA3, and FOXA1 around TET2-binding DMRs mirrored the patterns observed in TET2-binding DhMRs, with ESR1 being notably more enriched in TET2-binding DMRs than in DhMRs. Therefore, we speculated that the TET2-ER complex may drive breast tumorigenesis by affecting both DNA methylation and demethylation. In addition to the ER complex, the involvement of the proto-oncogene FOS in TET2-related DhMRs/DMRs was first proposed. Moreover, FOXM1 was found to be enriched in TET2-binding DMRs in breast tumorigenesis. These findings indicate that FOS and FOXM1 may be additional interactors of TET2 in breast tumors, but further studies are needed to elucidate the nature of these interactions and the role of FOXM1 in TET2-binding DMRs.

Research on cfDNA-based 5hmC has made significant progress in cancer screening, primarily in the context of the digestive and hematologic systems [15, 20, 55, 56]. Recently, Curtis et al. proposed that metastatic seeding of breast cancer may occur 2–4 years before the diagnosis of the primary tumor [64]. Building on data from Quake et al.’s study on cfDNA [50], thousands of DhMRs associated with breast cancer were identified between breast cancer patients and healthy controls. Compared to organs of the digestive system, changes of 5hmC in the cfDNA of breast cancer patients were relatively subtle, and these changes may be attributed to the limited blood flow to the breast. Therefore, the development of more sensitive 5hmC detection methods with low-input cfDNA remains imperative to gather more valuable insights for cancer screening. Our integrative analysis of cfDNA and breast tissue collectively suggests that 5hmC in cfDNA may serve as a valuable biomarker for early-stage invasive breast cancer screening. However, in order to develop reliable 5hmC-based biomarkers, a larger cohort and comprehensive pairwise comparisons are essential to identify common 5hmC features in cfDNA and primary breast tumors.

Conclusions

Taken together, the dynamic changes of DNA 5hmC and 5mC in breast lesions and their effects on transcriptional regulation are crucial in propelling the malignant transformation of breast tumors. TET2-related DNA demethylation, histone marks, and TFs can be orchestrated in promoting breast tumorigenesis through transcriptional regulation. In addition, 5hmC-based biomarkers are valuable and remain to be investigated in the screening of breast precancerous lesions and liquid biopsy of breast cancer.