Epigenetic alterations affecting hematopoietic regulatory networks as drivers of mixed myeloid/lymphoid leukemia

Mulet-Lazaro, Roger; van Herk, Stanley; Nuetzel, Margit; Sijs-Szabo, Aniko; Díaz, Noelia; Kelly, Katherine; Erpelinck-Verschueren, Claudia; Schwarzfischer-Pfeilschifter, Lucia; Stanewsky, Hanna; Ackermann, Ute; Glatz, Dagmar; Raithel, Johanna; Fischer, Alexander; Pohl, Sandra; Rijneveld, Anita; Vaquerizas, Juan M.; Thiede, Christian; Plass, Christoph; Wouters, Bas J.; Delwel, Ruud; Rehli, Michael; Gebhard, Claudia

doi:10.1038/s41467-024-49811-y

Epigenetic alterations affecting hematopoietic regulatory networks as drivers of mixed myeloid/lymphoid leukemia

Article
Open access
Published: 07 July 2024

Volume 15, article number 5693, (2024)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Epigenetic alterations affecting hematopoietic regulatory networks as drivers of mixed myeloid/lymphoid leukemia

Download PDF

3114 Accesses
10 Altmetric
Explore all metrics

Abstract

Leukemias with ambiguous lineage comprise several loosely defined entities, often without a clear mechanistic basis. Here, we extensively profile the epigenome and transcriptome of a subgroup of such leukemias with CpG Island Methylator Phenotype. These leukemias exhibit comparable hybrid myeloid/lymphoid epigenetic landscapes, yet heterogeneous genetic alterations, suggesting they are defined by their shared epigenetic profile rather than common genetic lesions. Gene expression enrichment reveals similarity with early T-cell precursor acute lymphoblastic leukemia and a lymphoid progenitor cell of origin. In line with this, integration of differential DNA methylation and gene expression shows widespread silencing of myeloid transcription factors. Moreover, binding sites for hematopoietic transcription factors, including CEBPA, SPI1 and LEF1, are uniquely inaccessible in these leukemias. Hypermethylation also results in loss of CTCF binding, accompanied by changes in chromatin interactions involving key transcription factors. In conclusion, epigenetic dysregulation, and not genetic lesions, explains the mixed phenotype of this group of leukemias with ambiguous lineage. The data collected here constitute a useful and comprehensive epigenomic reference for subsequent studies of acute myeloid leukemias, T-cell acute lymphoblastic leukemias and mixed-phenotype leukemias.

Acute lymphoblastic leukemia displays a distinct highly methylated genome

Article Open access 19 May 2022

Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia

Article 20 June 2016

Aberrant DNA hydroxymethylation reshapes transcription factor binding in myeloid neoplasms

Article Open access 28 June 2022

Introduction

Research on the pathogenesis of leukemia has traditionally emphasized the role of genetic lesions, but the importance of epigenetic dysregulation is becoming increasingly recognized. Several epigenetic modulators are recurrently mutated in acute myeloid leukemia (AML) and T-cell acute lymphocytic leukemia (T-ALL), including methylation regulators (DNMT3A, TET2, IDH1/2) and histone writers (EZH2, SUZ12, KMT2A, KDM6A)^1,2. On the other hand, numerous instances of epigenetic dysregulation leading to aberrant expression of proto-oncogenes have been documented, such as the enhancer hijacking leading to EVI1 overexpression in 3q26-rearranged AML^3,4 or the formation of a super-enhancer driving TAL1 upregulation in T-ALL⁵. However, recurrent epigenetic events may occur independently of a known genetic lesion, possibly due to the selection of clones that spontaneously acquire these alterations. For example, hypermethylation of DNMT3A recapitulates the effects of mutations in this gene⁶.

Therefore, genetic characterization of leukemia may be insufficient to identify all critical pathogenic mechanisms. Accordingly, clustering of AML samples by gene expression reveals subgroups that share known genetic lesions, but also others that cannot be linked to any known abnormality⁷. One of such subgroups was later found to be defined by CEBPA silencing due to hypermethylation⁸. This CEBPA-silenced cluster exhibited a mixed myeloid/T-lymphoid phenotype, resistance to myeloid growth factors, and possibly poor prognosis. Subsequent analyses unveiled a genome-wide hypermethylation signature that distinguished this subgroup from both AML and T-cell acute lymphocytic leukemia (T-ALL), yet no mutations typically associated with methylation defects⁹. More recently, an AML subtype with similar characteristics and methylation localized to CpG islands (CGIs) was identified, labeled CpG Island Methylator Phenotype (CIMP)^10,11. We hypothesize that CIMP and CEBPA-silenced leukemias are the same entity, hereinafter referred to as CIMP leukemias.

Hypermethylation of CGIs is a frequent event in cancer that often results in silencing of tumor suppressor genes¹². Although DNA methylation is traditionally associated with transcriptional repression, its cellular functions are, in fact, much more complex^13,14. Transcriptional repression in the presence of methylation is thought to be caused by (a) impaired TF binding, and (b) recruitment of chromatin remodelers via methyl-binding domain (MBD) proteins¹³. However, many DNA-binding factors have shown the ability to bind methylated sequences, whereas others may be repelled by DNA methylation¹⁵. A notable example of the latter is CTCF, which plays critical roles as insulator, transcriptional repressor or activator, and architectural protein¹⁶. Thus, aberrant methylation can disrupt CTCF-dependent boundaries of topologically associating domains (TADs), resulting in dysregulated expression of neighboring genes^17,18.

Leukemias with ambiguous lineage pose substantial challenges for diagnosis and treatment¹⁹. Mixed phenotype acute leukemias with myeloid and T-lymphoid features (T/M MPAL) are defined as a separate category by the World Health Organization (WHO) and the International Consensus Classification (ICC), based on co-expression of markers such as CD3 and MPO^20,21. Moreover, a subtype of T-ALL, known as early T-cell precursor leukemia (ETP-ALL), also exhibits a combination of myeloid and lymphoid surface markers^22,23. Recent studies have shown that ETP-ALL and T/M MPAL are similar at the genetic and epigenetic level²⁴, suggesting an overlap between these two entities. The emerging question is how CIMP leukemias, originally diagnosed as AMLs, are related to these other categories from a molecular perspective.

Here, we characterize the poorly understood CIMP leukemias by integrating multiple layers of genetic and epigenetic data. This integrative analysis reveals that these are immature leukemias with features from both AML and T-ALL, resembling ETP-ALL, and that hypermethylation results in the silencing of several lineage-specific TFs, accompanied by loss of accessibility at their binding sites. Similarly, methylation impairs CTCF binding, leading to chromatin remodeling and secondary changes in gene expression that contribute to the unique phenotype of these leukemias.

Results

Global DNA methylation identifies a distinct group of hypermethylated leukemias

Previous studies in separate AML cohorts independently identified clusters of patients with genome-wide hypermethylation, but no mutations typically related to DNA methylation^8,10. We profiled the methylome of 16 of these patients together with 49 other primary AMLs and CD34+ cells from 3 healthy donors (Supplementary Data 1). We used methyl-CpG immunoprecipitation coupled with sequencing (MCIP-seq) to assay 71,000 CpG-rich regions, covering 89% of the 28,691 CpG islands in the human genome (Supplementary Fig. 1a). More than half of the MCIP-seq peaks were near transcriptional start sites (TSS) (Supplementary Fig. 1b).

Unsupervised hierarchical clustering (Fig. 1a, Supplementary Fig. 1c) indicated that CIMP leukemia constitutes a separate subgroup with strong hypermethylation, particularly at regions hypomethylated in CD34+ cells. Samples from both studied cohorts (CIMP-EMC, originally CEBPA-silenced, and CIMP-UKR) clustered together, supporting the hypothesis that they belong to the same disease entity. This observation was supported by principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) (Supplementary Fig. 1d). Taken together, these data confirm that CIMP leukemias are a distinct entity characterized by global hypermethylation of CpG-rich regions.

**Fig. 1: Epigenetic and transcriptional landscape of CIMP leukemias compared to other leukemias and CD34+ cells.**

The epigenetic landscape of CIMP leukemias reveals an intermediate state between T-ALL and AML

To understand the regulatory underpinnings of their lineage ambiguity, we comprehensively profiled the epigenetic and transcriptional landscape of CIMP leukemias, as well as that of T-ALL, AML, and CD34+ cells from healthy donors (Fig. 1b).

Dimensionality reduction of MCIP-seq data showed that CIMP cases exhibit a methylation profile very close to that of most T-ALLs and markedly separate from that of the large majority of AMLs (Fig. 1c, Supplementary Fig. 2a). This was corroborated by Infinium MethylationEPIC array data from a subset of 5 CIMPs analyzed together with publicly available T-ALL²⁵ and AML²⁶ data (Supplementary Fig. 2b). In particular, CIMP leukemias clustered close to ETP-ALL, another entity with mixed myeloid/lymphoid features, and to highly methylated T-ALL cases (referred to as CIMP-positive T-ALL). However, CIMP cases in our cohort presented intermediate gene expression between AML and T-ALL, with most displaying stronger similarity to CEBPA DM AML than to ETP-ALL (Fig. 1c, Supplementary Fig. 2c). To explain this apparent discrepancy between methylation and transcription, we next analyzed H3K27ac ChIP-seq and ATAC-seq (Fig. 1c, Supplementary Fig. 2c). In both datasets, most CIMPs unequivocally clustered with CEBPA DM AML, with only 3 T-ALLs (one of which was ETP) included in the same group. Hierarchical clustering largely confirmed these findings (Supplementary Fig. 2d–g).

In summary, CIMP leukemias are an intermediate entity between AML and T-ALL with a hybrid epigenetic profile in which CpG-poor enhancer chromatin is accessible at myeloid regulatory regions, but transcription is repressed at CGI promoters by DNA methylation.

CIMP leukemias are genetically heterogeneous

To elucidate whether genetic aberrations lie at the base of CIMP leukemias, we conducted whole exome sequencing (WES). We found frequent single nucleotide variants (SNVs) and indels in NOTCH1, PHF6, MED12, WT1, IKZF1 and JAK3, but none were common to all individuals (Supplementary Data 2, Fig. 2a). Recurrent copy number alterations (CNAs) were identified in genes related to leukemia and hematopoiesis, among which deletions of a region containing NF1, EVI2A and EVI12B were particularly frequent (n = 6) (Supplementary Fig. 3c, Supplementary Data 3, 4). Most CIMP patients (9/14) exhibited gains or losses of at least one chromosomal arm (Fig. 2b, Supplementary Fig. 3a, b). RNA-seq data did not reveal any recurrent fusion genes, although a few patients carried fusions previously reported in leukemia^2,27,28 (Supplementary Fig. 3d, Supplementary Data 5). Of note, 6/13 patients analyzed did not harbor any detectable fusion gene.

**Fig. 2: The mutational landscape and gene expression signature of CIMP leukemias suggest similarity with ETP-ALL and a very early lymphoid progenitor as the cell of origin.**

Altogether, CIMP leukemias constitute a genetically heterogeneous subgroup, defined by epigenetic rather than genetic commonalities. Their mutational profiles are comparable to those of other acute leukemias of ambiguous lineage, especially ETP-ALL (see Supplementary Data 6-8 and Supplementary Results), suggesting there may be some overlap between these entities.

Transcriptional signatures suggest that the cell of origin is an early progenitor with a possible lymphoid bias

In line with their hybrid epigenetic profile and their previously reported phenotype⁸, CIMP leukemias expressed both typical myeloid markers such as CD13, CD33, CD34 and KIT (Fig. 2c) and lymphoid markers like CD7 and CD3 (Fig. 2d). Notably, surface markers used for immunophenotypic classification of ETP-ALL (expression of CD7, absence of CD1A and CD8, weak expression of CD5 and presence of myeloid markers such as CD13, CD33, CD34 and CD177/KIT^23,29), were expressed accordingly in CIMP, but not those that define T/M MPAL (presence of either MPO or monocytic markers like CD11c, CD14, CD64 or LZE, concomitantly with CD3 expression²⁰). To further investigate lineage relationships at the transcriptional level, we conducted gene set enrichment analysis (GSEA) (Fig. 2e, Supplementary Data 9–12). As expected, myeloid gene sets (e.g., Dick_GMP250) were downregulated compared to AML, but upregulated compared to T-ALL, the reverse was true for T-lymphoid genes. Interestingly, the top results from the comparison with T-ALL were gene sets derived from ETP-ALL relative to other T-ALLs (e.g., ETP-ALL_Zhang_Up, Supplementary Fig. 4a), as well as gene sets related to hematopioietic stem cells (HSCs) (e.g., Dick_HSC250). A comparison with CD34+ cells revealed upregulation of T-cell signatures, including ETP, but downregulation of HSC gene sets.

Transcriptional deconvolution with CIBERSORTx³⁰ showed enrichment for both CLP and GMP signatures, concomitantly with depletion of expression profiles associated with terminally differentiated cells, including T cells, monocytes, and neutrophils (Fig. 2f, Supplementary Fig. 4b). In a single sample GSEA with a selected number of hematopoietic-related gene sets, the CIMP group exhibited enrichment for HSC genes as well as myeloid and lymphoid signatures halfway between AML and T-ALL (Supplementary Fig. 4c, d).

Taken together, these results suggest that the cell of origin of CIMP leukemias is an early hematopoietic progenitor, possibly committed to the lymphoid lineage. This lymphoid skewing is supported by the upregulation of T-cell signatures relative to HSPCs, but also by the higher similarity with T-ALL in terms of recurrent mutations and methylation profiles. Furthermore, the upregulation of ETP-ALL gene sets further emphasizes the similarity with this other entity of ambiguous lineage.

Promoter methylation changes correlate with silencing of critical hematopoietic factors in CIMP leukemias

Next, we investigated the effects of methylation in CIMPs in relation to other leukemias and normal controls using MCIP-seq data. CIMPs exhibited profound hypermethylation at CGIs, promoters, enhancers, TSS, and gene bodies compared to AML or HSPCs (Fig. 3a, Supplementary Fig. 5a, b). On the other hand, methylation levels were similar in T-ALL. Promoter methylation levels were the highest in CIMP, followed by T-ALL, AML, and HSPCs (Fig. 3a–c). These differences are consistent with the higher levels of methylation in the lymphoid lineage^31,32,33. However, genome-wide hypermethylation was not present in terminally differentiated cells of those lineages (Supplementary Fig. 5c, d). Differential methylation analysis confirmed extensive hypermethylation in CIMP compared to AML, and to T-ALL to a lesser extent (Fig. 3d, Supplementary Data 13). Methylation array data analogously revealed larger differences between CIMP and AML than T-ALL, particularly at promoters (Supplementary Fig. 5e, f, Supplementary Data 14–16). Interestingly, genes hypomethylated in CIMP relative to T-ALL or ETP-ALL were associated with neutrophil activation, whereas hypermethylated genes were involved in T-cell development (Supplementary Fig. 5h, i, Supplementary Data 17–19). Thus, despite their similarities, CIMP leukemias are slightly more myeloid than ETP-ALLs, which may explain their original diagnosis as AML.

**Fig. 3: Functional assessment of methylation differences between CIMP and other leukemias.**

Hypermethylation was more pronounced in regions marked by both H3K4me3 and H3K27me3 (Fig. 3c, Supplementary Fig. 5j, k), typically referred to as bivalent promoters³⁴. This is in agreement with reports showing that bivalent promoters are more susceptible to DNA hypermethylation in cancer cell lines and primary tumors^10,35. GSEA of genes near DMRs confirmed preferential hypermethylation of H3K27me3 targets in CIMP relative to AML, T-ALL, and CD34+ HSPCs (Fig. 3e, Supplementary Fig. 5l, m, Supplementary Data 20–22). This phenomenon may explain the frequent hypermethylation of CGI-associated promoters of genes active in non-hematopoietic lineages, such as neurons (Supplementary Fig. 5g).

Moreover, GSEA detected an enrichment of transcription factor (TFs) genes in methylated regions relative to both AML and CD34+. Besides, some of the DMRs with the strongest increases were adjacent to genes such as CEBPA, LEF1, PLK2, MEIS1 or TLE4, which have a known involvement in either leukemia or hematopoiesis (Fig. 3f, Supplementary Fig. 5n). Integration of gene expression data confirmed that CIMP hypermethylation was accompanied by widespread gene silencing, with methylation levels negatively correlating with gene expression in comparisons with both AML (r = −0.23, p < 2.2 × 10⁻¹⁶) and T-ALL (p = −0.18, p < 2.2 × 10⁻¹⁶). A total of 100 TFs with silenced promoters were downregulated relative to AML, including hematopoietic regulators and genes known to be involved in leukemia, such as CEBPA, CEBPD, KLF4, IRF4 or MEIS1 (Fig. 4a, b, Supplementary Fig. 6a, b, Supplementary Data 23). Among the few TF genes differentially methylated between CIMP and T-ALL was LEF1, which participates in the early stages of thymocyte maturation³⁶ and is crucial for neutrophilic granulopoiesis³⁷. Interestingly, the tyrosine kinase LYN was exclusively repressed in T-ALL.

**Fig. 4: Integration of methylation and gene expression data reveals silencing of hematopoietic-related transcription factors.**

In order to study whether epigenetic changes outside promoters have effects on the transcriptional program of these leukemias, we also integrated RNA-seq with ATAC-seq. We excluded peaks overlapping with a TSS to select putative enhancers, which we then linked to target genes following an ensemble approach. As expected, changes in accessibility at these regions correlated with gene expression in differential analyses between CIMP and AML (r = 0.17, p value < 2.2 × 10⁻¹⁶) and T-ALL (r = 0.15, p value < 2.2 × 10⁻¹⁶), albeit only weakly (Fig. 4c, Supplementary Data 24). Similar results were observed for H3K27ac (Supplementary Fig. 6c, Supplementary Data 25). This modest correlation can be explained by multiple factors, such as the multiplicity of enhancers controlling the same promoter, the existence of concomitant epigenetic processes that participate in gene regulation (especially DNA methylation), and erroneous enhancer-promoter assignment.

In summary, many critical TFs are silenced by methylation in CIMP leukemias, which possibly explains the intermediate transcriptional state of these leukemias, as well as their differentiation arrest.

Hematopoietic TF networks are rewired in CIMP leukemias

The transcriptional effects of DNA methylation are thought to stem from altered binding affinities for TFs and other regulatory proteins¹³. To elucidate the role of this mechanism in CIMP leukemias, we analyzed the overlap between DMRs and experimentally validated TF binding sites (TFBS) using LOLA on both methylation array and MCIP-seq data. In keeping with the GSEA results, we detected a strong enrichment for binding of the PRC2 complex at hypermethylated regions relative to AML, but also for hematopoietic factors like RUNX1, CBFB, SPI1 or GATA1, as well as CTCF and cohesin (Fig. 5a, Supplementary Fig. 7a left side, Supplementary Fig. 7c, Supplementary Data 26, Supplementary Data 29). Surprisingly, the myeloid TFs CEBPA/B and SPI1 were also overrepresented at hypomethylated positions. When compared to T-ALL, hypermethylated DMRs were strongly enriched for binding sites of lymphoid TFs, notably NOTCH1 and MYB, and transcriptional regulators like RNA Pol II and CTCF (Fig. 5a, Supplementary Fig. 7a, d, Supplementary Data 27 and 30). Together with the fact that hypomethylated DMRs contained SPI1, CEBPB, and CEBPA sites, this suggests a significant number of myeloid regions remain active, thus preserving myeloid potential in CIMP cases. In keeping with this, CIMP leukemias also exhibited increased methylation at NOTCH1, MYB, and TAL1 binding sites than ETP-ALL, and hypomethylation of TFBS for myeloid master regulators such as CEBPA, CEBPB, and SPI1 (Supplementary Fig. 7b, Supplementary Data 28).

**Fig. 5: Disturbance of hematopoietic regulatory networks in CIMP leukemias suspends both hematopoiesis and lymphopoiesis.**

To further investigate changes in TF activity linked to lineage commitment, we used chromVAR to calculate deviations of chromatin accessibility at TF motifs from ATAC-seq data (Supplementary Fig. 8a, details in Supplementary Results). Chromatin at C/EBP and HLF binding sites were significantly less accessible in CIMP than in AML, whereas RUNX, FOX, and NFATC motifs, among others, were more active (Fig. 5b, Supplementary Fig. 8b, Supplementary Data 31). LEF1, TCF7, and E2F motifs were largely closed in CIMP compared to T-ALL, but those for SPI1 (PU.1) and BACH were more open. In line with these observations, differential accessibility analysis between AML and T-ALL established that accessible C/EBP and SPI1 sites are a hallmark of AML, whereas T-lymphoid leukemias are driven by LEF1, TCF7, and RUNX1, among others. These results were confirmed by footprinting analysis with TOBIAS, which predicts TF binding status based on dips in the depth of coverage at their motifs relative to surrounding open chromatin peaks (Supplementary Fig. 8c, d, Supplementary Data 32, 33). In addition to the abovementioned TFs, TOBIAS showed a significant increase in the binding of GATA factors in CIMP compared to AML. Furthermore, this technique also revealed that loss of KLF4 and CEBPA expression is accompanied by reduced binding of TFs at their promoters, possibly due to hypermethylation (Supplementary Fig. 8e). For validation, we conducted ChIP-seq for CEBPA, SPI1, and TCF7, predicted to be dysregulated. In all three cases, the results were in line with the differential motif activity inferred from open chromatin: CEBPA binding was completely absent in CIMP and T-ALL, and SPI1 exhibited higher binding in CIMP than in T-ALL, whereas the reverse was true for TCF7 (Supplementary Fig. 9a, b). Of note, ChIP-seq also corroborated that SPI1 binding is indeed lost at the KLF4 promoter (Supplementary Fig. 9c).

Integration with gene expression across 79 patients where both RNA-seq and ATAC-seq were available revealed that motif activity was largely consistent with gene expression (Fig. 5c). High correlation between expression levels and accessibility identified CEBPB and SPI1 as positive regulators in AML, whereas GATA3 and TCF7 played this role in T-ALL (Fig. 5d, Supplementary Fig. 8f, Supplementary Data 34). Integrated analysis of differential motif accessibility and expression pinpointed a few key TFs that drive the CIMP phenotype: loss of C/EBP proteins and gain of GATA3 separate them from AML, whereas reduced LEF1 and TCF7 prevent lymphoid lineage commitment (Fig. 5e). Interestingly, the motifs for SPI1 and CEBPA were concomitantly active in the same regions and mutually exclusive with LEF1 and GATA3 (Supplementary Fig. 8g).

Despite genome-wide epigenetic and transcriptional alterations (Supplementary Data 9, 13, 35–37), it appears that methylation-mediated silencing of a reduced set of critical hematopoietic TFs is a major driver of the altered phenotype of CIMP leukemias, ultimately impairing their ability to fully commit to either the lymphoid or the myeloid lineage.

Genome-wide hypermethylation leads to widespread loss of CTCF binding

Since DNA methylation may weaken the binding of CTCF^38,39, the hypermethylation observed at CTCF binding sites (Fig. 3a, Fig. 5a) suggested a possible loss of CTCF binding at those locations. Indeed, CTCF ChIP-seq showed that global CTCF levels were lower in CIMP than in AML and T-ALL (Fig. 6a, b). Differential analysis confirmed widespread loss of CTCF binding in CIMP with respect to AML, and to a lesser extent compared to T-ALL (Fig. 6c, Supplementary Data 38).

**Fig. 6: Hypermethylation in CIMP leukemias leads to loss of CTCF binding.**

For additional insight into the interplay between methylation and CTCF binding, we integrated these data with MCIP-seq, which covered 23% of the CTCF sites detected by ChIP-seq (Supplementary Fig. 11a). As expected, CTCF levels genome-wide were higher in regions with low DNA methylation (Fig. 6b) and gain of DNA methylation in CIMP cases correlated with loss of CTCF binding in the same regions relative to AML (rho = −0.27, p value < 2.2 × 10⁻¹⁶) (Fig. 6d, Supplementary Data 39). No meaningful correlation was observed when comparing CIMP to T-ALL, possibly due to insufficient DMRs. CIMP-versus-AML hypermethylation was more pronounced at CTCF binding sites (Supplementary Fig. 11b), especially at those that were lost (Fig. 6e), suggesting those regions are particularly prone to methylation changes.

The invariability of CTCF binding at many other locations (Supplementary Fig. 11c) is in accord with previous studies indicating that only certain CTCF binding sites are sensitive to methylation, such as the ones with CpG in their motif^40,41. To explore this possibility, we computed the frequency of CpG dinucleotides at every position of the canonical CTCF motif, which exhibits two peaks at positions 5 and 15, respectively (Supplementary Fig. 11d). CTCF motifs found in regions with loss of CTCF binding and hypermethylation exhibited CpGs at those two positions more frequently than regions where CTCF binding remained unchanged or increased (Supplementary Fig. 11e).

Loss of CTCF binding is accompanied by changes in 3D organization

Given the prominent role of CTCF in the stabilization of cohesin-mediated chromatin loops^42,43, we conducted in situ Hi-C experiments on CIMP (n = 9), AML (n = 5), T-ALL (n = 4) and HSPCs (n = 3) to assess changes in 3D genome organization. Roughly 40–50% of CTCF binding sites overlapped with TAD boundaries and 10-20% with loop anchors, without differences between variable and unchanged peaks (Supplementary Fig. 12a). We detected a clear separation between AML and other leukemias both at the level of TADs (Fig. 7a, Supplementary Fig. 12b) and loops (Fig. 7b, Supplementary Fig. 12c). Most CIMP cases clustered together with T-ALLs, with a few (DD46, DD63) exhibiting stronger similarity with AML. Supervised comparisons of differential loops or interactions (DIs) and variable TADs (ΔTADs) confirmed that differences between CIMP and AML were larger than between CIMP and T-ALL, but smaller than between AML and T-ALL (Fig. 7c, d).

**Fig. 7: Chromatin interaction landscape of CIMP and other leukemias.**

Contrary to our expectations, we did not observe a widespread depletion of chromatin loops or TADs upon loss of CTCF binding in CIMP cases. However, 72% of the loops lost in CIMP relative to AML exhibited decreased CTCF binding in at least one of their anchors, compared to 59% in gained interactions (Fig. 7e). Moreover, the average decrease in CTCF binding was significantly larger in lost interactions (Supplementary Fig. 12e). Therefore, while most changes of chromatin conformation in CIMP seem to occur independently of hypermethylation-derived loss of CTCF binding, the latter has a contributing role. Such effects were not observed in comparisons with T-ALL (Supplementary Fig. 12d, e).

Next, we conducted an unbiased survey of ΔTADs (Supplementary Data 40) and DIs (Supplementary Data 41) with associated changes in CTCF binding and potential implications for gene expression. When comparing CIMP and AML, we found 61 ΔTADs containing differentially expressed genes with loss of CTCF binding at their boundaries and 71 differential enhancer-promoter loops, whose interaction strength strongly correlated with the expression of genes they contacted (rho = 0.67, p = 1.8 × 10⁻¹⁰, Fig. 7f). Among others, loss of insulation was detected at the TADs containing KLF4 (Fig. 7g) and CEBPD (Fig. 7h), both of which also displayed decreased chromatin interactions accompanied by reduced CTCF binding. Interestingly, their promoters were also methylated, suggesting a possible cooperation between distinct epigenetic mechanisms in repression. Examples of gained enhancer-promoter interactions included a loop connecting GATA3 with a nearby enhancer element specific to CIMP (Fig. 7i) and a loop involving the promoter of DNMT3B (Supplementary Fig. 12i). More details are provided in the Supplementary Results.

In sum, CIMPs exhibit partial rewiring of chromatin interactions when compared to AML, of which only a fraction is attributable to loss of CTCF. However, this mild remodeling correlates with the misexpression of some essential TFs. Very few 3D genome differences could be detected between CIMP and T-ALL, in line with the notion that these leukemias originate from a lymphoid-biased cell.

Discussion

In this study, we investigated a group of leukemias with a CpG Island Methylator Phenotype (CIMP) and mixed myeloid/lymphoid features. We established that this phenotype is the consequence of a hybrid epigenetic landscape that integrates methylation patterns similar to those of T-ALL with a regulatory landscape that retains significant myeloid potential, resulting in an intermediate transcriptional program. In the absence of common genetic lesions, this shared epigenetic profile seems to be the main defining feature of CIMP leukemias.

The existence of acute leukemias with a mixed myeloid/lymphoid phenotype has long been recognized⁴⁴. The 2022 WHO classification and the ICC identify T/M MPALs based on a reduced number of immunophenotypic markers^20,21, but the CIMP cases identified here do not always conform to these criteria (Fig. 2c, d). Moreover, the mutational profile of MPALs does not exactly match our findings²⁰. In the end, T/M MPAL is a broad category that may partially overlap with CIMP, but possibly encompasses multiple subtypes of leukemia with variable cells of origin and pathogenic mechanisms. Another well-known leukemia with ambiguous lineage is ETP-ALL²², defined initially by a gene expression signature derived from murine ETPs, but typically identified by associated membrane markers. Although CIMP and ETP-ALL cluster separately in genome-wide analyses of epigenetic and transcriptional data, they share immunophenotypic markers and mutational profiles. Therefore, CIMP and ETP-ALL may be related disease entities in a spectrum ranging from myeloid to lymphoid phenotypes, ultimately distinguished by their epigenetic features.

The hybrid epigenetic state of CIMP leukemias and their mixed phenotype invites the question of what their cell of origin is. DNA methylation is a stable mark of epigenetic memory that maintains cell identity across cell divisions⁴⁵, which has been exploited to predict cell types³² and identify the cell of origin in various cancers^46,47. The hypermethylation in CIMP and many T-ALLs with respect to AML could relate to the higher baseline activity of DNA methyl transferases in the lymphoid lineage^31,32,33. On the other hand, open chromatin and H3K27ac indicate proximity to the myeloid lineage, whereas transcriptional data paints an intermediate picture. The inconsistency between different analyses is a likely consequence of phenotypic plasticity and the heterogeneity of these leukemias, some of which appear more myeloid (e.g., #DD166, #3491). The emerging conclusion from these results and the transcriptional signatures detected by CIBERSORTx and GSEA is that CIMPs are likely to stem from an early progenitor, possibly lymphoid-primed, but with the capability to differentiate into myelo-erythroid cell types. Although the specific cell of origin remains uncertain, the lymphoid-primed multipotent progenitor (LMPP) compartment is compatible with these observations. Of note, Zhang et al. reported that ETP-ALLs are enriched for GMP and HSC gene sets and thus derive from HSPCs, rather than ETPs as initially thought²⁴. This is congruent with our observations in CIMP, once again underscoring the similarity between these entities, and suggests they both derive from very early lymphoid progenitors.

Aberrant methylation results in silencing of several critical TFs involved in myeloid lineage specification, including CEBPA, KLF4 or IRF4. Interestingly, IRF4 and a few genes like MAFB (another inducer of monocytic maturation) or KLF4 are completely repressed in CIMP, whereas they remain active in some T-ALL cases. Some TFs involved in lymphopoiesis are also exclusively silenced in CIMP leukemias, such as LEF1, a nuclear mediator of WNT signaling that regulates early stages of thymocyte maturation³⁶ and represses CD4 + T-cell programs in CD8 + T cells⁴⁸. Deletion of LEF1 results in the upregulation of non-T-lymphoid genes via genome reorganization⁴⁹, which could contribute to the mixed phenotype observed here. Accompanying these transcriptional changes were alterations in methylation and chromatin accessibility at binding sites of critical myeloid (CEBP family, SPI1), lymphoid (NOTCH1, LEF1, TCF7) and other hematopoietic (BACH2, HLF, IRF4) factors. Incidentally, hypermethylation of SPI1 binding sites has been reported as a driver of leukemogenesis in TET2-mutated AML⁵⁰. Altogether, hypermethylation of key TFs and their binding sites in CIMP leukemias dysregulates hematopoietic networks, suspending differentiation at an intermediate, unresolved epigenetic state.

Several lines of evidence suggest that the loss of C/EBP TFs is a cornerstone of the leukemogenic process in CIMP leukemias. Firstly, all members of the family, except for the antagonistic CEBPG, are silenced by promoter hypermethylation. Secondly, chromatin accessibility at regions containing C/EBP motifs is drastically reduced. Thirdly, CIMP exhibits epigenetic similarity with AML subtypes in which CEBPA is either repressed or dysfunctional, namely t(8;21) AML and CEBPA DM AML. Given that CEBPA promotes myelopoiesis at the expense of lymphoid commitment⁵¹, its inactivation without compensation is likely to be a common driver of differentiation block in both CIMP and AML. Interestingly, the +42-kb enhancer that drives CEBPA expression in myeloid cells⁵² is active in both CIMP and AML, but absent in T-ALL (Supplementary Fig. 13a, b). It is thus tempting to speculate that transformation took place in a cell type that would normally be primed to express CEBPA, once again pointing to an early progenitor that is only biased towards the lymphoid lineage, but retaining multilineage priming.

DNA hypermethylation was also pronounced at CTCF binding sites, which was accompanied by widespread loss of CTCF binding. Since many of these sites co-located with loop anchors and TAD boundaries where CTCF stabilizes cohesin-mediated interactions, we expected a major impact on 3D genome organization. However, this was not the case. A possible explanation is that CTCF loss does not necessarily abolish TADs. While total depletion of CTCF does lead to a global loss of TADs⁵³, alteration of a single CTCF site may^54,55 or may not^56,57 be sufficient to perturb a TAD boundary. This is partially because many TAD boundaries harbor clusters of redundant CTCF binding sites that confer them resilience to small changes^58,59, but also to the existence of alternative mechanisms that preserve TAD boundaries⁵⁷. Depletion of CTCF must be near complete for a significant impact on TAD insulation⁵³, which explains why the limited loss due to methylation changes results in mostly modest changes. On the other hand, although CTCF is present at the vast majority of TAD boundaries, it is only found at a small fraction of enhancer-promoter loops⁴², which are frequently occupied instead by YY1^60,61. Besides, the reduced number of differential interactions identified may be a consequence of the limited sample size and resolution of this Hi-C dataset.

Nonetheless, hypermethylation-driven CTCF loss modulates 3D organization at specific loci, in keeping with previous studies^17,18. This phenomenon may be complemented by changes in TF genes like LEF1, which also modulates chromatin interactions⁴⁹. A striking example is the disruption of several loops and TAD insulation at the KLF4 locus, presumably abolishing the interaction between its promoter and putative enhancers, which is inactive in CIMP. The lost CTCF binding site that normally stabilizes these loops is at the KLF4 promoter, which is hypermethylated. Among its multiple roles in hematopoiesis⁶², KLF4 is required for monocyte differentiation⁶³, whereas its downregulation is required for lineage commitment of T cells⁶⁴. During these processes, KLF4 stimulates the formation of open chromatin and directly establishes de novo chromatin loops independently of CTCF^65,66, possibly explaining changes in the 3D structure of CIMP leukemias that do not co-occur with variations in CTCF binding. Inactivation of KLF4 by promoter methylation has been previously reported in T-ALL⁶⁷ and chronic lymphocytic leukemia (CLL)⁶⁸, as inhibition of T-cell genes by KLF4 impairs T-ALL progression⁶⁹. Thus, the complete loss of KLF4 in CIMPs potentially contributes to a blockade of the myeloid trajectory while enabling the expression of lymphoid genes. Notably, the expression of KLF4 in CLL can be rescued by inhibition of NOTCH1, which is frequently mutated in CLL⁶⁸. As 43% of the CIMP cases also exhibit such activating mutations, targeting NOTCH1 can be an attractive therapeutic avenue for these leukemias.

The mechanisms underlying aberrant methylation in CIMPs are uncertain. None of the recurrently mutated genes in this leukemia have any known involvement in the methylation machinery. However, the expression of TET2 was significantly downregulated relative to AML due to promoter hypermethylation, whereas DNMT1, DNMT3A, and DNMT3B were slightly upregulated by either demethylation or gained chromatin interactions. That is, aberrant methylation could result from the inactivation of demethylating enzymes coupled with an increase in de novo and maintenance methylation. As mentioned above, another likely possibility is that the methylation signature of CIMP leukemias is partially inherited from their cell of origin, explaining the similarity with a subset of T-ALLs. A distinctive feature of this aberrant methylation is that it preferentially localizes to bivalent promoters, in keeping with reports that bivalent promoters are susceptible to DNA hypermethylation in cancer³⁵. One possible explanation is that H3K4me3, which protects bivalent promoters against DNA methylation by DNMT3A^70,71, is lost in these regions (Supplementary Fig. 13d). Moreover, DNMT3A has been reported to associate with PRC2, which could lead to hypermethylation of H3K27me3-marked domains in the absence of protective H3K4me3⁷². This interaction could be facilitated by the lack of expression of DNMT3L (Supplementary Data 9), which competes with DNMT3A and DNMT3B for interaction with PRC2⁷³.

In conclusion, CIMP leukemias are a group of immature leukemias of ambiguous lineage whose mixed phenotype reflects a hybrid epigenomic landscape, with methylation patterns of lymphoid leukemias superimposed on an enhancer repertoire that preserves a large degree of myeloid potential (Supplementary Fig. 13e). The repression of CEBPA likely plays a key role in locking out the myeloid lineage, while the formation of new loops enables the expression of T-cell genes like GATA3. At the same time, silencing of other TFs required for T-cell commitment, such as KLF4, prevent terminal differentiation of T cells. Taken together, this study provides a detailed picture of the unique epigenomic landscape of CIMP leukemias and identifies potential mechanisms driving their differentiation arrest. Furthermore, the data collected here constitute a useful epigenomic reference for subsequent studies in AML, T-ALL, and mixed phenotype leukemias.

Methods

Ethical statement

Our research complies with all relevant ethical regulations and was approved by the Medical Ethical Committee of the Erasmus University Medical Center (Medisch Ethische Toetsings Commissie). All patients provided written informed consent in accordance with the Declaration of Helsinki.

Patient material and data generation

Samples of AML, CIMP, T-ALL patients and healthy donors were collected from the biobanks of the Erasmus MC Hematology department (Rotterdam, The Netherlands), the University Hospital Regensburg Internal Medicine department (Regensburg, Germany) and the University Hospital Carl Gustav Carus (Dresden, Germany). Mononuclear cells were isolated from bone marrow or peripheral blood as described previously⁷. A summary of the data generated for each patient or donor is available in Supplementary Data 1.

Data generation and processing

Methyl-CpG immunoprecipitation sequencing (MCIp-seq)

To measure methylation, we employed Methyl-CpG-immunoprecipitation (MCIP) a technique that relies on a fusion protein consisting of the methyl-binding domain (MBD) of MBD2 and the Fc portion of IgG1 to detect methylated regions, exploiting the natural preference of MBD for 5-methylcytosine (5-mC)⁷⁴. MCIP-seq was performed using the EpiMark® Methylated DNA Enrichment Kit (NEB, Frankfurt, Germany) according to the manufacturer´s guidelines. In brief, genomic DNA was fragmented to an average size of 200 bp using the sonication system Covaris S220 (Covaris, Woburn, USA). Each sample (200 ng) was incubated with 15 µl MBD2-Fc/Protein A magnetic beads and incubated for 1 h at room temperature. Unbound DNA was washed off with washing buffer containing 500 mmol/L NaCl. Captured methylated DNA was recovered by adding 50 µl DNAse free water and incubation at 65 °C for 15 minutes. The distribution of CpG methylation densities in both fractions (unmethylated and methylated) was controlled by qPCR using primers covering the imprinted SNRPN and a genomic region lacking CpGs (empty 6.2). Sequencing libraries were prepared with the NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB) according to the manufacturer’s instructions. The quality of dsDNA libraries was analyzed using the High Sensitivity D1000 ScreenTape Kit (Agilent) and concentrations were determined with the Qubit dsDNA HS Kit (Thermo Fisher Scientific). Libraries were single-end sequenced on a HiSeq 3000 (Illumina).

MCIP-seq reads were aligned to the human reference genome build hg19 with bowtie⁷⁵ (v1.1.1) and bigwig files were generated for visualization with deepTools bamCoverage⁷⁶ (v3.5.1) and the options --normalizeUsing RPKM --smoothLength 100 --binSize 20. Peak calling was performed with MACS2⁷⁷ (v2.2.7.1) using default settings and input DNA as a control. The resulting peaks were filtered against the ENCODE blacklisted regions⁷⁸. Furthermore, a list of regions accessible by MCIP-seq was defined based on data from monocytes treated with the CpG Methyltransferase SssI. All peaks not overlapping with this list of mappable regions were considered false positives and discarded using bedtools intersect⁷⁹. Functional annotation of peaks was performed with the ChIPseeker (Supplementary Fig. 1a) and the annotatr⁸⁰ (Supplementary Fig. 1b) R packages. The assay detected 71,000 CpG-rich regions with an average length of 650 base pairs (bp), covering 89% of the 28,691 CpG islands in the human genome (Supplementary Fig. 1a).

RNA sequencing (RNA-seq)

DNA and RNA were isolated using the AllPrep DNA/RNA mini kit (Qiagen, #80204). RNA was converted into cDNA using the SuperScript II Reverse Transcriptase (Thermo Fischer Scientific) according to standard diagnostic procedures. Sample libraries were prepped using 500 ng of input RNA according to the KAPA RNA HyperPrep Kit with RiboErase (HMR) (Roche) using Unique Dual Index adapters (Integrated DNA Technologies, Inc.). Amplified sample libraries were paired-end sequenced (2 × 100 bp) on the Novaseq 6000 platform (Illumina) and aligned against the human genome (hg19) using STAR v2.5.4b⁸¹.

Whole exome sequencing

The Genomic DNA Clean & Concentrator kit (ZYMO Research) was used to remove EDTA from the DNA samples. Sample libraries were prepared using 100 ng of input according to the KAPA HyperPlus Kit (Roche) using Unique Dual Index adapters (Integrated DNA Technologies, Inc.). Exomes were captured using the SeqCap EZ MedExome (Roche Nimblegen) according to SeqCap EZ HyperCap Library v1.0 Guide (Roche) with the xGen Universal blockers – TS Mix (Integrated DNA Technologies, Inc.). The amplified captured sample libraries were paired-end sequenced (2 × 100 bp) on the Novaseq 6000 platform (Illumina) and aligned to the hg19 reference genome using the Burrows-Wheeler Aligner (BWA)⁸², v0.7.15-r1140.

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq)

ChIP-seq data with antibodies targeted at histone marks (H3K27ac, H3K27me3) were performed as described previously with slight modifications⁸³. Briefly, cells were crosslinked with 1% formaldehyde for 10 minutes at room temperature, and the reaction was quenched with glycine at a final concentration of 0.125 M. Chromatin was sheared using the Covaris S220 focused-ultrasonicator to an average size of 250–350 bp. A total of 2.5 µg of antibody against H3K27ac (Abcam, ab4729) or H3K27me3 (Diagenode, C15410069) was added to sonicated chromatin of 2 × 10⁶ cells and incubated overnight at 4 °C. Protein A sepharose beads (GE Healthcare) were added to the ChIP reactions and incubated for 2 h at 4 °C. Beads were washed and chromatin was eluted. After crosslink reversal, RNase A, and proteinase K treatment, DNA was extracted with the Monarch PCR & DNA Cleanup kit (NEB). Sequencing libraries were prepared with the NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB) according to the manufacturer’s instructions. The quality of dsDNA libraries was analyzed using the High Sensitivity D1000 ScreenTape Kit (Agilent), and concentrations were assessed with the Qubit dsDNA HS Kit (Thermo Fisher Scientific). Libraries were either single-end sequenced on a HiSeq 3000 (Illumina) or paired-end sequenced on a Novaseq 6000 (Illumina).

ChIP-seq for CTCF, TCF7, CEBPA, and SPI1 (PU.1) was performed as follows. For TCF7 immunoprecipitation, cells were double crosslinked for 45 minutes at room temperature using 2 mM DSG (ThermoFisher Scientific, 20593) followed by a crosslink for 10 minutes at room temperature with 1% formaldehyde. When using antibodies against either CTCF, CEBPA, or SPI1, cells were single-crosslinked for 10 minutes at room temperature with 1% formaldehyde. All reactions were quenched with glycine at a final concentration of 0.125 M. When performing the TCF7 ChIP, cells were washed 3 times using cells lysis buffer A (10 mM Tris-HCL pH 7.5, 10 mM NaCl, 3 mM MgCl, 0.5% NP40). After last wash, cells were lysed in sonication lysis buffer (0.8% SDS, 160 mM NaCl, 10 mM Tris-HCL pH 7.5, 10 mM NaCl, 3 mM MgCL, 1 mM CaCl2, and 4% NP40). When performing the CEBPA or SPI1 ChIP, cells were lysed in lysis buffer B (50 mM Tris pH 8, 10 mM EDTA, 1% SDS). When performing the CTCF ChIP, cells were lysed in lysis buffer C (10 mM HEPES/NaOH, 85 mM NaCl, 1 mM EDTA) followed by a nuclear lysis buffer (50 mM Tris-HCl pH 7.5, 1% SDS, 0,5% EMPIGEN, 10 mM EDTA). All lysates were sonicated on a Biorupter Pico sonication device (Diagenode). Chromatin was diluted 4 to 5 times using IP dilution buffer (1.1% Triton X-100, 0.01% SDS, 167 mM NaCl, 16.7 mM Tris-HCL pH 8.0 and 1.2 mM), except for the CTCF ChIP. A 2% sample was saved as a chromatin input control. A total of 2 ug of antibody against TCF7 (Cell Signaling 2203 S), 5 ug of antibody against CEBPA (Santa Cruz Biotechnology SC9314) or 0.4 ug of antibody against SPI1 (Cell Signaling 2266 S) was added to sonicated chromatin of 20 × 10⁶ cells or 4 ug of antibody against CTCF (Cell Signaling 2899 S) was added to sonicated chromatin of 2 × 10⁶ cells and incubated overnight at 4 °C. Chromatin bound antibody was precipitated with prot G Dynabeads (Thermo Fisher Scientific) and washed with low salt buffer (20 mM Tris pH 8, 2 mM EDTA, 1% Triton, 150 mM NaCl), high salt buffer (20 mM Tris pH 8, 2 mM EDTA, 1% Triton, 500 mM NaCl), LiCl buffer (10 mM Tris, 1 mM EDTA, 0.25 mM LiCl, 0.5% IGEPAL, 0.5% Sodium-Deoxycholate) and TE (10 mM Tris pH 8, 1 mM EDTA). In the TCF7 or CTCF ChIP chromatin was eluted in elution buffer A (0.1 M Sodiumhydrogencarbonate, 1% SDS). In the CEBPA or SPI1 ChIP chromatin was eluted in elution buffer B (25 mM Tris, 10 mM EDTA, 0.5% SDS). Crosslinks were reversed overnight at 65 °C in the presence of proteinase K (New England Biolabs P8107S), and DNA was extracted using a MinElute PCR Purification Kit (Qiagen 28004). Sequencing libraries were prepared using the MicroPlex Library v3 (Diagenode C05010001) preparation kit combined with dual indexes (Diagenode C05010008) and sequenced on the NovaSeq 6000 platform (Illumina).

ChIP-seq reads were aligned to the human reference genome build hg19 with either bowtie⁷⁵ (v1.1.1) for single-end data or bowtie2⁸⁴ (v2.3.4.1) for paired-end data. Bigwig files were generated for visualization with deepTools bamCoverage⁷⁶ (v3.5.1), and the options --normalizeUsing RPKM --smoothLength 100 --binSize 20. For data with narrow read distributions (H3K27ac, CTCF), peak calling was performed with MACS2⁷⁷ (v 2.2.7.1) using default settings, and the resulting peaks were filtered against the ENCODE blacklisted regions⁷⁸. For H3K27me3, which is found in broad domains, peak calling was performed with EPIC2⁸⁵.

Assay for transposase-accessible chromatin using sequencing (ATAC-seq)

ATAC-seq was essentially carried out as described⁸⁶. Briefly, prior to transposition the viability of the cells was assessed, and 1 × 10⁶ cells were treated in a culture medium with DNase I (Sigma) at a final concentration of 200 U ml⁻¹ for 30 minutes at 37 °C. After Dnase I treatment, cells were washed twice with ice-cold PBS, and cell viability and the corresponding cell count were assessed. 5 × 10⁴ cells were aliquoted into a new tube and spun down at 500 × g for 5 minutes at 4 °C, before the supernatant was discarded completely. The cell pellet was resuspended in 50 µl of ATAC-RSB buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2) containing 0.1% NP40, 0.1% Tween-20, and 1% Digitonin (Promega), and was incubated on ice for 3 minutes to lyse the cells. Lysis was washed out with 1 ml of ATAC-RSB buffer containing 0.1% Tween-20. Nuclei were pelleted at 500 × g for 10 minutes at 4 °C. The supernatant was discarded carefully, and the cell pellet was resuspended in 50 µl of transposition mixture (25 µl 2× tagment DNA buffer, 2.5 µl transposase (100 nM final; Illumina), 16.5 µl PBS, 0.5 µl 1% digitonin, 0.5 µl 10% Tween-20, 5 µl H2O) by pipetting up and down six times. The reaction was incubated at 37 °C for 30 minutes with mixing before the DNA was purified using the Monarch PCR & DNA Cleanup Kit (NEB) according to the manufacturer’s instructions. Purified DNA was eluted in 20 µl elution buffer (EB) and 10 µl purified sample was objected to a ten-cycle PCR amplification using Nextera i7- and i5-index primers (Illumina). Purification and size selection of the amplified DNA were carried out with Agencourt AMPure XP beads. For purification the sample to beads ratio was set to 1:1.8, whereas for size selection the ratio was set to 1:0.55. Purified samples were eluted in 15 µl of EB. The quality and concentration of the generated ATAC libraries were analyzed using the High Sensitivity D1000 ScreenTape Kit (Agilent) and libraries were sequenced paired-end on a NovaSeq 6000 (Illumina).

ATAC-seq reads were aligned to the human reference genome build hg19 with bowtie2⁸⁴ (v2.3.4.1), which is recommended for longer reads, and mitochondrial and duplicate reads were excluded. Bigwig files were generated as described above. Peak calling was also performed with MACS2⁷⁷ (v 2.2.7.1), but with the following settings: --nomodel --shift 100 --extsize 200. The resulting peaks were filtered against the ENCODE blacklisted regions⁷⁸.

Hi-C

Low input Hi-C was performed using 12k flow-sorted cells as previously reported⁸⁷. The following procedural modifications was made: the mock PCR amplification was monitored by qPCR instead of by using an agarose gel. This was done by removing the magnetic beads and adding 20× sybr (Biotium 3100) to a small aliquot of the PCR reaction after two cycles. Amplified libraries were sequenced on a Novaseq 6000 instrument.

Hi-C data were first processed with HiCUP⁸⁸ v0.8.2, a pipeline for mapping and processing Hi-C data that removes technical artifacts and other invalid or uninformative di-tags. As part of this pipeline, the reads were aligned to the human reference genome build hg19 using Bowtie2⁸⁴ v2.3.4.1. Filtered di-tags were then extracted with the script hicup2juicer and subsequently binned with juicer tools pre⁸⁹ v1.22.01 at the default resolutions 2.5 Mb, 1 Mb, 500 Kb, 250 Kb, 100 Kb, 50 Kb, 25 Kb, 10 Kb, and 5 Kb. The resulting.hic files were used for visualization. TADs and loops were identified for each group of leukemias with the findTADsAndLoops.pl find script of the HOMER suite with the parameters –res 5000 and –window 10,000. Loops and TADs from each group were then combined into a single list with the merge2Dbed.pl script and individual scores were calculated per each sample with findTADsAndLoops.pl score using the same settings as above. These scores were imported to DESeq2⁹⁰ (v1.24.0) with the DESeqDataSetFromMatrix function and transformed with varianceStabilizingTransformation for unsupervised clustering analysis, as described above for other epigenomics data. Differences in TADs and loop scores between conditions were computed with a two-tailed Wald test using the DESEq function. The Benjamini–Hochberg method was applied to correct for multiple hypothesis testing with an FDR < 0.05. The results were visualized with plotgardener⁹¹.

Because each dataset was sequenced at a relatively low depth of coverage (average = 398 M paired-end reads, 300 M valid pairs, and 155 M unique pairs), identification of 3D organization features was conducted in aggregate for each group and subsequently merged into a master list. The 4537 TADs and 9443 loops detected across all datasets were comparable to previous Hi-C results, such as 5975 domains and 6058 loops in K562 or 9274 domains and 9449 loops in GM12878⁴³. Roughly 40-50% of CTCF sites overlapped with TAD boundaries and 10-20% with loop anchors, with minimal differences between variable and unchanged peaks (Supplementary Fig. 12a).

Bioinformatics and data visualization

Statistical tests were conducted on R version 4.1.0 unless otherwise specified. Most plots were generated using the ggplot2 R package, whereas heatmaps were created with ComplexHeatmap⁹² and genomic regions were visualized with plotgardener⁹¹.

An extended description of the methods, with additional details on data generation and analyses, is provided in the Supplementary Information.

Identification of functional regions

Putative enhancer and promoter regions were defined for genome-wide quantification of methylation. In both cases, they were defined by three complementary criteria: (a) relative position to genes, (b) telltale histone marks, and c) eRNA expression. For enhancer identification, we constructed a consensus collection of H3K27ac-marked regions present in 3 or more samples from the CIMP, AML, T-ALL, and HSPC groups, excluding peaks that overlapped with 1 kb windows around transcriptional start sites (TSS) by at least 5% of their width. This list was intersected with a collection of open chromatin regions derived from the same groups (detectable in at least 3 samples) and with putative enhancers detected by CAGE-seq by the FANTOM consortium⁹³ (human_permissive_enhancers_phase_1_and_2.bed). For promoter identification, we downloaded the CAGE peaks assigned to TSS by the FANTOM consortium (hg19.cage_peak_phase1and2combined_tpm_ann.osc.txt.gz), excluded those not expressed in healthy hematopoietic cells, and intersected them with H3K4me3 peaks from CD34+ cells obtained from ENCODE⁷⁸.

CpG islands, identified according to the original criteria of Gardiner-Garden and Frommer⁹⁴, were downloaded from the UCSC browser.

For the comparison of genome-wide methylation levels at these functional regions, a Welch’s t-test was used, which is quite robust to deviations from this distribution when sample sizes are very large, by virtue of the central limit theorem⁹⁵. The effect size was calculated as Cohen’s d (1), which standardizes the difference between two means by the pooled standard deviation:

$${Cohe}{n}^{{\prime}}{s\; d}=\frac{{\bar{x}}_{1}-{\bar{x}}_{2}}{\sqrt{\frac{({s}_{1}^{2}+{s}_{2}^{2})/}{2}\,}},$$

(1)

where x is the mean of each group, and s is the standard deviation

The result is the number of standard deviations units that separate two groups, with D = 1 indicating that the mean of one group is 1 standard deviation larger than that of the other. Typically, values below 0.2 are considered small and above 0.8 are considered large⁹⁶.

Quantification and differential analysis of peak-based data

Peak signal (MCIP-seq, ChIP-seq, ATAC-seq) was quantified with the DiffBind R package⁹⁷ as follows. First, all peaks were combined in a single master list using the default settings of the package, keeping only peaks present in at least 2 samples and removing chromosomes not present in the primary assembly and unassigned sequences. Only peaks with a –log(q value) higher than 10, as determined by MACS2, were considered. Overlapping peaks across multiple samples were combined into a single entry. The dba.blacklist function was used with the greylist argument set to false to remove only blacklisted regions. Then, reads mapping to this master list were counted for each sample, subtracting reads mapping to input DNA samples processed in the same way.

For differential analysis, data were normalized using the trimmed mean of the M values (TMM)⁹⁸ with the sum of reads in consensus peaks as the library size (argument normalize=DBA_NORM_TMM in DiffBind). These normalization factors and the raw counts were passed to DESeq2 (v1.34.0) with the dba.analyze command and differential regions were identified as those with a false discovery rate (FDR) < 0.5 by the Benjamini–Hochberg method⁹⁰. Peaks were annotated to the closest gene with the ChIPpeakAnno package⁹⁹.

Identification of transcription factor binding sites enriched at peak sets was conducted using LOLA¹⁰⁰. For visualization, results were aggregated by TF, selecting the dataset with the lowest p value from a hematopoietic tissue, if possible, or another tissue otherwise.

Clustering of transcriptional and epigenomics data

MCIP-seq, RNA-seq, ATAC-seq, and CHIP-seq data were processed similarly to identify relevant relationships between CIMP leukemias and other diseases. Briefly, raw counts were imported to DESeq2 with the DESeqDataSetFromMatrix function and transformed with varianceStabilizingTransformation to reduce the dependence of the variance from the mean. The 5000 regions or genes with the highest variance in transformed counts were selected for further analysis. PCA, multidimensional scaling (MDS), t-distributed Stochastic Neighbor Embedding (t-SNE), and UMAP were used for dimensionality reduction and visualized with ggplot2. Since results were highly comparable across different strategies (Supplementary Fig. 2), only PCA and UMAP were used in the rest of the figures. Moreover, heatmaps of either Pearson correlation or Euclidean distances between samples were created with ComplexHeatmap⁹².

In the clustering analyses of ATAC-seq data, patient UKR021 was excluded due to excessively low quality (hence, n = 81). However, it was included in other analyses performed with those data.

Analysis of MethylationEPIC array data

Illumina Infinium MethylationEPIC datasets compiled from CIMP AML, T-ALL²⁵ (GSE147667), and AML²⁶ (GSE159907) were subjected to quality control, preprocessing, and normalization using RnBeads¹⁰¹. First, we filtered out SNP-overlapping probes, cross-reactive probes, sites outside of CpG context, and those mapping to sex chromosomes, as well as probes with detection p value > 0.05, and sites covered by fewer than three beads. Next, beta-values were normalized and background subtraction was performed using the scaling.internal and sesame.noobsb options.

For comparison of global methylation at different genomic regions, CpG islands were defined according to RnBeads annotation. Illumina MethylationEPIC annotation files were used to map CpG sites to FANTOM enhancers⁹³ and gene bodies, and to match CpG sites to their associated genes. Promoters were defined as the region 1500 bp upstream and 500 bp downstream of transcription start sites.

Differential methylation between groups was computed at the level of CpG sites using RnBeads. CpG sites with absolute mean beta value difference > 0.2 and FDR-adjusted p value < 0.05 were considered differentially methylated. Among differentially methylated sites, enrichments of transcription factor binding sites and gene ontologies were calculated using the LOLA¹⁰⁰ and clusterProfiler R¹⁰² packages, respectively.

Salmon¹⁰³ was used to quantify the expression of individual transcripts, which were subsequently aggregated to estimate gene-level abundances with the R package tximport¹⁰⁴. Human gene annotation derived from GENCODE¹⁰⁵ v30 was downloaded as a GTF file and used for the quantification. Both gene- and transcript-level abundances were normalized to counts per million for visualization in the figures of this paper. Differential gene expression analysis of count estimates from Salmon was performed with DESEq2⁹⁰ v1.34.0. The Benjamini–Hochberg method was applied to correct for multiple hypothesis testing with an FDR < 0.05.

Integration of MCIP-seq and RNA-seq data

To determine how changes in methylation relate to gene expression, we quantified MCIP-seq signal exclusively at promoters, defined on the basis of H3K4me1 peaks as explained above. Then, we computed differential methylation with DiffBind and we left-joined the results with the differential expression previously calculated with DESeq2 using gene names to match the records. The results were visualized as a Starbust plot, in which the -log10(FDR) multiplied by the sign of the fold change are shown for the MCIP-seq data in the X axis and the RNA-seq data in the Y axis. To facilitate the identification of TFs involved in hematopoiesis (those within the GO term GO:0030097), they were highlighted with red circles and labeled.

Identification and analysis of small genetic variants

Single nucleotide variant (SNV) and small insertion/deletion (indel) detection was performed with a custom script that integrated variants called by multiple software tools, including HaplotypeCaller and MuTecT2 from GATK v4.0.0¹⁰⁶, VarScan2¹⁰⁷, bcftools¹⁰⁸, Strelka2¹⁰⁹ and Pindel¹¹⁰. A highly optimized in-house tool (annotateBamStatistics, available at https://gitlab.com/erasmusmc-hematology/annotatebamstatistics) was then used to compute the variant allele frequency (VAF) of every variant as well as position-specific metrics for such as strand bias, number of clipped reads or the number of alternative alignments (Supplementary Data 2). The combined list of variants was subjected to stringent filtering to remove low-quality positions, considering the following criteria:

a.
strand bias between 0 and 1 for regions within the exome capture (+200 bp)
b.
total sequencing depth of at least 8 reads and 4 for the variant allele
c.
alignment quality 40 or more and base calling score 30 or more
d.
fewer than 40% of reads mapping to a base other than the reference and alternative alleles
e.
maximum of 10% of the reads with an alternative alignment or a superior alternative alignment score in bwa (XS)
f.
removal of extremely long indels (500 bp or more)
g.
removal of variants in simple repeats as detected by RepeatMasker¹¹¹(downloaded from UCSC)
h.
removal of variants in highly repetitive genomic regions, as determined by 95% or more identity to another region in selfChain link files from UCSC
i.
removal of clusters of at least 3 SNVs with a distance of less than 5 bp from each other

Furthermore, since we did not have control material for these patients, we selected mutations likely to be somatic among the variants identified by WES based on functional annotation by Annovar¹¹². Thus, we first considered mutations complying with the following criteria: a) located in exons or in splicing acceptor regions, b) non-synonymous SNV or indels, and c) with a VAF of at least 1%. Single nucleotide polymorphisms (SNPs) with a population frequency higher than 0.0002 were excluded unless they were reported in the COSMIC database v94¹¹³ in at least 5 hematological cancers, or they were present in genes with frequent clonal hematopoiesis mutations (DNMT3A, TET2, ASXL1)¹¹⁴. Variants present in a healthy donor (though not a paired matched control) were also removed to further eliminate common variants and technical artifacts. Moreover, variants present in a blacklist of frequent non-somatic variants found in WES from AML and CD34+ cells were discarded. Finally, probable oncogenic variants were selected as those that fulfilled one or more of the following conditions: (i) in the COSMIC database; (ii) frameshift, stopgain or startloss; (iii) majority of damaging functional predictions by tools such as PolyPhen, SIFT, LRT and others.

Given the difficult interpretation of some of these variants, the resulting list was further reduced by selecting only genes previously reported in leukemia (Disgenet database¹¹⁵), cancer (COSMIC¹¹³), or relevant in hematopoiesis (GO term GO:0030097). This file (Supplementary Data 2) was used as an input for the oncoPrint function of the ComplexHeatmap R package to show the distribution of mutations in this cohort.

To compare the mutational landscape of CIMP leukemias to that of other acute leukemias, we compiled data from published studies on AML^27,116, T-ALL^{2,117,118,119}, ETP-ALL^{23,118,119,120,121} and T/M MPAL^24,122. We then counted the occurrences of mutations in every gene and conducted a two-tailed Fisher test.

Copy number alteration detection

Copy number analysis on WES data was performed with CNVkit¹²³ v0.9.9 in two steps. First, a pooled reference was generated based on 12 datasets from healthy CD34+ cells (9 from adult bone marrow and 3 from cord blood). As suggested by the instructions of the program, 5 kb regions of poor mappability were excluded from the analysis. Subsequently, the reference was employed to compute log2 copy ratios and infer discrete copy number segments using the default settings of CNVkit. Finally, we derived absolute integer copy numbers of these segments with the function cnvkit call and copy number alterations (CNAs) were computed at the gene-level with cnvkit genemetrics. Copy number data were summarized across all AML samples and represented as a heatmap with ComplexHeatmap. Scatter plots of specific regions such as NF1 were created with cnvkit scatter.

These results were validated by orthogonal analyses with CNV Radar¹²⁴ on WES data and Control-FREEC¹²⁵ on input DNA sequencing data generated for the ChIP-seq and MCIp-seq experiments. For CNV Radar, common SNPs (db SNP 151) were annotated in the variants called by bcftools call with the SnpSift¹²⁶ tool, as prescribed by the instruction manual. This step ensures that the B-allele frequency (BAF) is only calculated with polymorphisms that are expected to be heterozygous, avoiding distortions introduced by potentially subclonal somatic mutations. A panel of non-matched normals was used as a control analogously to the previous analysis with CNVkit. Control-FREEC was run without controls in large windows of 100,000 bp to compensate for the low sequencing depth of the files.

Fusion gene detection

Fusion gene identification was carried out on RNA-seq reads by means of an ensemble of software tools, namely STAR-Fusion¹²⁷, FusionCatcher¹²⁸, Arriba¹²⁹, Pizzly¹³⁰, JAFFA¹³¹ and SQUID¹³². Results from these tools were integrated with fusion-reporter, a Python script developed for the nf-core framework of bioinformatics pipelines¹³³. Fusion gene candidates were filtered with the databases bundled with FusionCatcher, thereby excluding those previously found in studies of healthy tissues or involving partners in close proximity. Majority voting by a minimum of 3 tools was employed to select the final fusion candidates per sample, which were then combined into a single master list.

The combined list of fusions was further annotated based on their presence in fusion gene databases (FusionGDB, COSMIC, and Mitelman) or previous reports of that fusion in leukemia studies^{118,134,135,136}. Fusions whose individual genes are involved in leukemia according to the Disgenet database¹¹⁵ were also annotated. The master list and the leukemia-related annotations were visually represented with the oncoPrint function of the ComplexHeatmap R package.

Gene set enrichment analysis and identification of hematopoietic signatures

Pre-ranked gene set enrichment analysis (GSEA)¹³⁷ was computed with the fgsea R package using the multilevel splitting Monte Carlo approach to calculate p values, with the settings minSize=15, maxSize=5000. For every comparison, the input genes were ranked by the -log10(FDR) multiplied by the sign of the fold change, both obtained from DESeq2. We used the MSigDB C5 collection, containing GO terms, to investigate enrichment for gene functions and biological processes¹³⁸. We also employed a customized MSigDB C2 collection, containing version v7.5.1 of C2 plus several hematopoiesis-related datasets kindly provided by Dr Charles Mullighan. Moreover, we added datasets derived from supervised comparisons between ETP-ALL and other T-ALLs^22,23, as well as a signature of leukemia induced in DN2 thymocytes mice by a retrovirus coexpressing Myc and Bcl2¹³⁹. Both C2 and C5 were downloaded with the msigdbr R package.

To evaluate the potential cell of origin of CIMP leukemias, we analyzed the samples with single sample GSEA (ssGSEA) implemented as a part of the GSVA R package¹⁴⁰. For this analysis we focused on gene sets specific for certain hematopoietic cell fractions, obtained from¹⁴¹ and¹⁴². With that same goal, we employed CIBERSORTx³⁰, originally designed to dissect cell type proportions in a mixture on the basis of a signature matrix. Signature matrices were generated from a single-cell RNA-seq dataset obtained from the Atlas of Human Blood Cells¹⁴³.

Integration of ATAC-seq and RNA-seq data

To characterize the transcriptional consequences of epigenetic changes in CIMP leukemias beyond promoter hypermethylation, we integrated chromatin accessibility with gene expression data. First, we classified ATAC-seq peaks into putative promoters if they overlapped with a region of 500 bp around a TSS ± 500 bp, and the rest as putative enhancers. In order to assign putative enhancers to their target genes, we integrated multiple layers of information, namely: distance between enhancer and the nearest promoter, correlation between ATAC-seq peaks, co-occurrence of both elements in the same TAD, contact by a loop, previous assignment by the GeneHancer database¹⁴⁴. Next, we used DiffBind to compute differential ATAC-seq signal between putative enhancers successfully assigned to a target gene. For each of these genes, differences in the accessibility of their enhancers were linked to differences in their expression. The results of this analysis were visualized as a Starbust plot as described for RNA-seq and MCIP-seq integration in the corresponding section of the Methods.

Analysis of motif activity in chromatin accessibility data

The regulatory networks driving a differentiation block in CIMP were investigated with chromVAR (v1.20.2), a tool that estimates TF motif activity by computing bias-corrected deviations in chromatin accessibility at motif-containing peaks relative to the expectation¹⁴⁵. First, motifs for human TFs were downloaded from JASPAR (v2022)¹⁴⁶ and filtered to exclude any genes not expressed in any leukemia samples of our RNA-seq cohort with at least 1 TPM, yielding a total of 721 motifs. The consensus peaks derived from ATAC-seq data (see “Quantification and differential analysis of peak-based data”) were then evaluated for matches with any of those motifs using the function matchMotifs from the motifmatchr package (v1.20.0). Having determined which peaks contained which motifs, genome-wide deviations in motif accessibility were calculated with the computeDeviations using the ATAC-seq computed by DiffBind. Variability in the activity of each motif, defined as the standard deviation of the Z-scores across the entire cohort, was computed with computeVariability. Differential accessibility was calculated as the difference in the mean of deviation Z-scores, and statistical significance was assessed with a two-tailed Wilcoxon test. The results of this analysis were depicted as a volcano plot.

Putative positive regulators were identified by calculating the correlation between the deviations estimated by chromVAR and the expression of the gene in the same sample, expressed in TPM. This is an approach frequently used in single-cell data analysis and it relies on the assumption that the binding sites of activators become accessible when such genes are expressed^{147,148,149,150}.

Furthermore, we conducted TF footprinting with TOBIAS¹⁵¹, a software suite specifically designed for the analysis of ATAC-seq data. Since the high depth of coverage is recommended for footprinting and TOBIAS does not deal with replicates, we first used samtools merge to aggregate multiple samples from each group: all available CIMP cases (n = 9) and a representative fraction of AML (n = 20) and T-ALL (n = 20). We called peaks on each combined BAM file as described before and, for each comparison, we merged the peak files of the two conditions involved. Then, we ran TOBIAS ATACorrect to generate bigwig tracks corrected for the Tn5 sequence bias of ATAC-seq, followed by TOBIAS FootprintScores to calculate a footprinting score based on the depletion of signal relative to nearby regions. Finally, TOBIAS BINDetect combined these footprint scores with the information of TF binding motifs derived from JASPAR (filtered as described above). The results of this analysis were depicted as a volcano plot. Moreover, TOBIAS PlotAggregate was used to visualize the ATAC aggregated signal of selected TF motifs in various conditions of interest.

Integration of Hi-C data with gene expression and CTCF binding

TAD boundaries were defined as 5000 bp regions (same as the resolution used for TAD calling) centered on their borders. CTCF binding sites overlapping with those regions were identified, but only a single peak with the smallest FDR was kept for each boundary, depending on which comparison was conducted. Similarly, MCIP-seq peaks overlapping with boundaries were selected based on their FDR for each comparison. Differential expression of genes within TADs was also incorporated. This information is summarized in Supplementary Data 40, which was used to identify variable TADs with a) significant changes in CTCF binding in their boundaries, and b) differentially expressed genes.

Loop anchors were defined according to the coordinates provided by HOMER, with an extra padding of 5000 bp on each side to account for the resolution used in loop calling. Enhancers and promoters (see “Identification of functional regions”) in the vicinity of loop anchors were identified at a distance of 25,000 bp or less. Thus, we could select enhancer-promoter loops as those with an anchor close to an enhancer and a promoter on each side (Supplementary Data 41). For loop anchors attached to a promoter, differential expression of the corresponding gene between the conditions of interest was also retrieved.

Statistics and reproducibility

Unless otherwise specified, analyses performed in this study used the sample sizes shown at the bottom of Supplementary Data 1. No sample size calculation was performed, as the sample size was determined by the availability of samples and/or data. However, sample sizes for all the main phenotypic groups studied here (CIMP, AML, T-ALL, and CD34+) were large enough to robustly identify statistically significant differences. Briefly, we collected samples from 376 leukemia patients and 19 healthy donors in total, although not all data were generated for every individual. CIMP cases (n = 14) were the same across all datasets, but the composition of the T-ALL, AML, and CD34+ groups varied. For AML, a cohort representative of recurrent genetic abnormalities was used (n = 224). RNA-seq was previously available from another study¹⁵², but MCIP-seq, ChIP-seq, ATAC-seq, and Hi-C data were generated ad hoc for a fraction of those patients. Similarly, RNA-seq of T-ALL patients (n = 114) was available from another ongoing study, but it was supplemented with additional epigenomics data. The T-ALL patients sequenced with MCIP-seq were not included in any other experiment. ATAC-seq from AML patient UKR201 was removed from the clustering analysis due to low quality, but not from other analyses.

Every measurement of the same type was taken from a different sample, i.e., no technical replicates were produced. The reproducibility of the findings was assessed by incorporating orthogonal evidence. For example, publicly available MethylationEPIC array data from T-ALL²⁵ and AML²⁶ were used to confirm that CIMP patients exhibit strong methylation signatures similar to those of ETP-ALL. Randomization was not applicable to the current study because CIMP leukemias were compared to those without this phenotype. Patients included in control groups (AML, T-ALL) were selected in such a way that they reflected the variety of mutational backgrounds in those leukemias. Donors for CD34+ cells were randomly selected.

The statistical tests used in the analysis of these data are described in the corresponding section of the Methods and, when appropriate, in the legends of the visual elements where the results of these analyses are presented. Generally, we used a two-sided Wald test for count data derived from omics techniques; a two-sided Wilcoxon test for pairwise comparisons of other data, such as average methylation levels at different genomics features; and a Fisher’s exact test for enrichment (one-sided) and association (two-sided) analyses.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw RNA-seq data of AML patients have been previously used in another study¹⁵² and are available at the European Genome-phenome Archive (EGA) under accession number EGAS00001004684. All the other raw sequencing data derived from donors or patients have been generated in this study and are deposited at the EGA under accession number EGAS00001007094. This EGA study includes the following datasets: MCIP-seq, RNA-seq, ATAC-seq, ChIP-seq (H3K27ac, CTCF, SPI1, CEBPA, TCF7), and Hi-C. Since these data are derived from human subjects, they are only available under restricted access, which can be requested for each dataset separately on the EGA website. Requestors must sign a data access agreement outlining the terms and conditions for data use and fill in a form specifying their research question. Requests will be processed within 1 week, and the data will be available for a maximum of 2 years unless an appeal for extension is submitted. Processed data are publicly available in ArrayExpress with the following identifiers: E-MTAB-13117 (CTCF ChIP-seq), E-MTAB-13118 (ATAC-seq), E-MTAB-13119 (H3K27ac ChIP-seq), E-MTAB-13120 (MCIP-seq), E-MTAB-13121 (RNA-seq), E-MTAB-13122 (Hi-C), E-MTAB-14060 (TF ChIP-seq). The remaining data are available within the Article, Supplementary Information, and Source Data that accompany this article. In addition, we have used publicly available data from the ENCODE⁷⁸ and FANTOM⁹³ consortia, as well as a single-cell RNA-seq dataset of hematopoietic cells obtained from the Gene Expression Omnibus (GEO) database under the accession code GSE149938¹⁴³. We also used Illumina Infinium MethylationEPIC data from T-ALL²⁵ (GSE147667) and AML²⁶ (GSE159907). Source data are provided with this paper.

Code availability

All software tools employed in this study are freely or commercially available (see Methods). R code used in the analysis of the data presented here can be found in Supplementary Code 1.

References

Shih, A. H., Abdel-Wahab, O., Patel, J. P. & Levine, R. L. The role of mutations in epigenetic regulators in myeloid malignancies. Nat. Rev. Cancer 12, 599–612 (2012).
Article CAS PubMed Google Scholar
Liu, Y. et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nat. Genet. 49, 1211–1218 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gröschel, S. et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381 (2014).
Article PubMed Google Scholar
Ottema, S. et al. Atypical 3q26/MECOM rearrangements genocopy inv(3)/t(3;3) in acute myeloid leukemia. Blood 136, 224–234 (2020).
Article PubMed Google Scholar
Mansour, M. R. et al. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Jost, E. et al. Epimutations mimic genomic mutations of DNMT3A in acute myeloid leukemia. Leukemia 28, 1227–1234 (2014).
Article CAS PubMed Google Scholar
Valk, PJM et al. Prognostically useful gene-expression profiles in acute myeloid leukemia. N. Engl. J. Med. 350, 1617–1628 (2004).
Article PubMed Google Scholar
Wouters, B. J. et al. Distinct gene expression profiles of acute myeloid/T-lymphoid leukemia with silenced CEBPA and mutations in NOTCH1. Blood 110, 3706–3714 (2007).
Article CAS PubMed PubMed Central Google Scholar
Figueroa, M. E. et al. Genome-wide epigenetic analysis delineates a biologically distinct immature acute leukemia with myeloid/T-lymphoid features. Blood 113, 2795–2804 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gebhard, C. et al. Profiling of aberrant DNA methylation in acute myeloid leukemia reveals subclasses of CG-rich regions with epigenetic or genetic association. Leukemia 33, 26–36 (2019).
Article CAS PubMed Google Scholar
Kelly, A. D. et al. A CpG island methylator phenotype in acute myeloid leukemia independent of IDH mutations and associated with a favorable outcome. Leukemia 31, 2011–2019 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jones, P. A. & Baylin, S. B. The epigenomics of cancer. Cell 128, 683–692 (2007).
Article CAS PubMed PubMed Central Google Scholar
Greenberg, M. V. C. & Bourc’his, D. The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol. 20, 590–607 (2019).
Article CAS PubMed Google Scholar
Jones, P. A. Functions of DNA methylation: Islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012).
Article CAS PubMed Google Scholar
Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).
Article PubMed PubMed Central Google Scholar
Ong, C.-T. & Corces, V. G. CTCF: an architectural protein bridging genome topology and function. Nat. Rev. Genet. 15, 234–246 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wiehle, L. et al. DNA (de)methylation in embryonic stem cells controls CTCF-dependent chromatin boundaries. Genome Res. 29, 750–761 (2019).
Article CAS PubMed PubMed Central Google Scholar
Flavahan, W. A. et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, 110–114 (2016).
Article ADS CAS PubMed Google Scholar
Khan, M., Siddiqi, R. & Naqvi, K. An update on classification, genetics, and clinical approach to mixed phenotype acute leukemia (MPAL). Ann. Hematol. 97, 945–953 (2018).
Article CAS PubMed Google Scholar
Khoury, J. D. et al. The 5th edition of the World Health Organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia 36, 1703–1719 (2022).
Article PubMed PubMed Central Google Scholar
Weinberg, O. K. et al. The international consensus classification of acute leukemias of ambiguous lineage. Blood. 141, 2275–2277 (2023).
Coustan-Smith, E. et al. Early T-cell precursor leukaemia: a subtype of very high-risk acute lymphoblastic leukaemia. Lancet Oncol. 10, 147–156 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zhang, J. et al. The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 481, 157–163 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Alexander, T. B. et al. The genetic basis and cell of origin of mixed phenotype acute leukaemia. Nature 562, 373–406 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Touzart, A. et al. Epigenetic analysis of patients with T-ALL identifies poor outcomes and a hypomethylating agent-responsive subgroup. Sci. Transl. Med. 13, eabc4834 (2021).
Giacopelli, B. et al. DNA methylation epitypes highlight underlying developmental and disease pathways in acute myeloid leukemia. Genome Res. 31, 747–761 (2021).
Article PubMed PubMed Central Google Scholar
The Cancer Genome Atlas Research Network, Ley, T. J. et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
Article PubMed Central Google Scholar
Li, J. F. et al. Transcriptional landscape of B cell precursor acute lymphoblastic leukemia based on an international study of 1,223 cases. Proc. Natl Acad. Sci. USA 115, E11711–E11720 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Arber, D. A. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).
Article CAS PubMed Google Scholar
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bock, C. et al. DNA methylation dynamics during in vivo differentiation of blood and skin stem cells. Mol. Cell. 47, 633–647 (2012).
Article CAS PubMed PubMed Central Google Scholar
Farlik, M. et al. DNA methylation dynamics of human hematopoietic stem cell differentiation. Cell. Stem Cell. 19, 808–822 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467, 338–342 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Bernstein, B. E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006).
Article CAS PubMed Google Scholar
Ohm, J. E. et al. A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat. Genet. 39, 237–242 (2007).
Article CAS PubMed PubMed Central Google Scholar
Okamura, R. M. et al. Redundant regulation of T cell differentiation and TCRα gene expression by the transcription factors LEF-1 and TCF-1. Immunity 8, 11–20 (1998).
Article CAS PubMed Google Scholar
Skokowa, J. et al. LEF-1 is crucial for neutrophil granulocytopoiesis and its expression is severely reduced in congenital neutropenia. Nat. Med. 12, 1191–1197 (2006).
Article CAS PubMed Google Scholar
Felsenfeld, G. & Bell, A. C. Methylation of a CTCF-dependent boundary controls imprinted expressionof the Igf2 gene. Nature 405, 482–485 (2000).
Article ADS PubMed Google Scholar
Hark, A. T. et al. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405, 486–489 (2000).
Article ADS CAS PubMed Google Scholar
Wang, H. et al. Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res. 22, 1680–1688 (2012).
Article CAS PubMed PubMed Central Google Scholar
Maurano, M. T., Wang, H., Kutyavin, T. & Stamatoyannopoulos, J. A. Widespread site-dependent buffering of human regulatory polymorphism. PLoS Genet. 8, e1002599 (2012).
Article CAS PubMed PubMed Central Google Scholar
Phillips-Cremins, J. E. et al. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153, 1281–1295 (2013).
Article CAS PubMed PubMed Central Google Scholar
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Scamurra, D. O., Davey, F. R., Nelson, D. A., Kurec, A. S. & Goldberg, J. Acute leukemia presenting with myeloid and lymphoid cell markers. Ann. Clin. Lab. Sci. 13, 496–502 (1983).
CAS PubMed Google Scholar
Kim, M. & Costello, J. DNA methylation: an epigenetic mark of cellular memory. Exp. Mol. Med. 49, e322–e322 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zhu, T. et al. A pan-tissue DNA methylation atlas enables in silico decomposition of human tissue methylomes at cell-type resolution. Nat. Methods 19, 296–306 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bormann, F. et al. Cell-of-origin DNA methylation signatures are maintained during colorectal carcinogenesis. Cell Rep. 23, 3407–3418 (2018).
Article CAS PubMed Google Scholar
Xing, S. et al. Tcf1 and Lef1 transcription factors establish CD8+ T cell identity through intrinsic HDAC activity. Nat. Immunol. 17, 695–703 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shan, Q. et al. Tcf1 and Lef1 provide constant supervision to mature CD8+ T cell identity and function by organizing genomic architecture. Nat. Commun. 12, 1–20 (2021).
Article ADS MathSciNet CAS Google Scholar
Aivalioti, M. M. et al. PU.1-dependent enhancer inhibition separates Tet2-deficient hematopoiesis from malignant transformation. Blood Cancer Discov. 3, 444–467 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hasemann, M. S. et al. C/EBPα is required for long-term self-renewal and lineage priming of hematopoietic stem cells and for the maintenance of epigenetic configurations in multipotent progenitors. PLoS Genet. 10, e1004079 (2014).
Avellino, R. et al. An autonomous CEBPA enhancer specific for myeloid-lineage priming and neutrophilic differentiation. Blood 127, 2991–3003 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nora, E. P. et al. Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. Cell 169, 930–944.e22 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).
Article PubMed PubMed Central Google Scholar
Guo, Y. et al. CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell 162, 900–910 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rodríguez-Carballo, E. et al. The HoxD cluster is a dynamic and resilient TAD boundary controlling the segregation of antagonistic regulatory landscapes. Genes Dev. 31, 2264–2281 (2017).
Article PubMed PubMed Central Google Scholar
Barutcu, A. R., Maass, P. G., Lewandowski, J. P., Weiner, C. L. & Rinn, J. L. A TAD boundary is preserved upon deletion of the CTCF-rich Firre locus. Nat. Commun. 9, 1444 (2018).
Article ADS PubMed PubMed Central Google Scholar
Kentepozidou, E. et al. Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains. Genome Biol. 21, 1–19 (2020).
Article Google Scholar
Chang, L.-H. et al. A complex CTCF binding code defines TAD boundary structure and function. bioRxiv. https://www.biorxiv.org/content/10.1101/2021.04.15.440007v1.
Weintraub, A. S. et al. YY1 is a structural regulator of enhancer-promoter loops. Cell 171, 1573–1588.e28 (2017).
Article CAS PubMed PubMed Central Google Scholar
Beagan, J. A. et al. YY1 and CTCF orchestrate a 3D chromatin looping switch during early neural lineage commitment. Genome Res. 27, 1139–1152 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ghaleb, A. M. & Yang, V. W. Krüppel-like factor 4 (KLF4): What we currently know. Gene 611, 27–37 (2017).
Article CAS PubMed PubMed Central Google Scholar
Feinberg, M. W. et al. The Kruppel-like factor KLF4 is a critical regulator of monocyte differentiation. EMBO J. 26, 4138–4148 (2007).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Wen, X., Liu, H. & Xiao, G. Downregulation of the transcription factor KLF4 is required for the lineage commitment of T cells. Cell Res. 21, 1701–1710 (2011).
Article PubMed PubMed Central Google Scholar
Di Giammartino, D. C. et al. KLF4 is involved in the organization and regulation of pluripotency-associated three-dimensional enhancer networks. Nat. Cell Biol. 21, 1179–1190 (2019).
Article PubMed PubMed Central Google Scholar
Wei, Z. et al. Klf4 organizes long-range chromosomal interactions with the OCT4 locus inreprogramming andpluripotency. Cell. Stem Cell. 13, 36–47 (2013).
Article CAS PubMed Google Scholar
Shen, Y. et al. Inactivation of KLF4 promotes T-cell acute lymphoblastic leukemia and activates the MAP2K7 pathway. Leukemia 31, 1314–1324 (2017).
Article CAS PubMed Google Scholar
Filarsky, K. et al. Krüppel-like factor 4 (KLF4) inactivation in chronic lymphocytic leukemia correlates with promoter DNA-methylation and can be reversed by inhibition of NOTCH signaling. Haematologica 101, e249–e253 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, W. et al. Genome-wide analyses identify KLF4 as an important negative regulator in T-cell acute lymphoblastic leukemia through directly inhibiting T-cell associated genes. Mol. Cancer. 14, 26 (2015).
Otani, J. et al. Structural basis for recognition of H3K4 methylation status by the DNA methyltransferase 3A ATRX-DNMT3-DNMT3L domain. EMBO Rep. 10, 1235–1241 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ooi, S. K. T. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature 448, 714–717 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Viré, E. et al. The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439, 871–874 (2006).
Article ADS PubMed Google Scholar
Neri, F. et al. Dnmt3L antagonizes DNA methylation at bivalent promoters and favors DNA methylation at gene bodies in ESCs. Cell 155, 121 (2013).
Article CAS PubMed Google Scholar
Gebhard, C. et al. Genome-wide profiling of CpG methylation identifies novel targets of aberrant hypermethylation in myeloid leukemia. Cancer Res. 66, 6118–6128 (2006).
Article CAS PubMed Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article PubMed PubMed Central Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central Google Scholar
ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cavalcante, R. G. & Sartor, M. A. Annotatr: genomic regions in context. Bioinformatics 33, 2381–2383 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Pham, T. H. et al. Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states. Blood 119, e161–e171 (2012).
Article CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Stovner, E. B. & Sætrom, P. Epic2 efficiently finds diffuse domains in ChIP-seq data. Bioinformatics 35, 4392–4393 (2019).
Article CAS PubMed Google Scholar
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Article CAS PubMed PubMed Central Google Scholar
Díaz, N. et al. Chromatin conformation analysis of primary patient tissue using a low input Hi-C method. Nat. Commun. 9, 1–13 (2018).
Article ADS Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Research. 4, 1310 (2015).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Kramer, N. E. et al. Plotgardener: cultivating precise multi-panel figures in R. Bioinformatics 38, 2042–2045 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Article CAS PubMed Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Gardiner-Garden, M. & Frommer, M. CpG Islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987).
Article CAS PubMed Google Scholar
Fagerland, M. W. T-tests, non-parametric tests, and large studiesa paradox of statistical practice? BMC Med. Res. Methodol. 12, 1–7 (2012).
Article Google Scholar
Sawilowsky, S. S. Very large and huge effect sizes. J. Mod. Appl. Stat. Methods 8, 597–599 (2009).
Article Google Scholar
Ross-Innes, C. S. et al. Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481, 389–393 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Article PubMed PubMed Central Google Scholar
Zhu, L. J. et al. ChIPpeakAnno: a bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics. 11, 1–10 (2010).
Article Google Scholar
Sheffield, N. C. & Bock, C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and bioconductor. Bioinformatics 32, 587–589 (2015).
Article PubMed PubMed Central Google Scholar
Assenov, Y. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11, 1138–1140 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284–287 (2012).
Article CAS PubMed PubMed Central Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Article CAS PubMed PubMed Central Google Scholar
Soneson, C., Love, M. I. & Robinson, M. D. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research. 4, 1521 (2016).
Article PubMed Central Google Scholar
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Article CAS PubMed Google Scholar
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Article CAS PubMed PubMed Central Google Scholar
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Article CAS PubMed Google Scholar
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Article CAS PubMed PubMed Central Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.10.1–4.10.14 (2009).
Article Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Article CAS PubMed Google Scholar
Jaiswal, S. & Ebert, B. L. Clonal hematopoiesis in human aging and disease. Science. 366, eaan4673 (2019).
Piñero, J. et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 48, D845–D855 (2020).
ADS PubMed Google Scholar
Tyner, J. W. et al. Functional genomic landscape of acute myeloid leukaemia. Nature 562, 526–531 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kalender Atak, Z. et al. Comprehensive analysis of transcriptome variation uncovers known and novel driver events in T-cell acute lymphoblastic leukemia. PLoS Genet. 9, e1003997 (2013).
Article PubMed Central Google Scholar
Chen, B. et al. Identification of fusion genes and characterization of transcriptome features in T-cell acute lymphoblastic leukemia. Proc. Natl Acad. Sci. USA 115, 373–378 (2017).
Article ADS PubMed PubMed Central Google Scholar
Neumann, M. et al. Mutational spectrum of adult T-ALL. Oncotarget 6, 2754–2766 (2015).
Article PubMed Google Scholar
Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).
Article CAS PubMed PubMed Central Google Scholar
Neumann, M. et al. Whole-exome sequencing in adult ETP-ALL reveals a high rate of DNMT3A mutations. Blood 121, 4749–4752 (2013).
Article CAS PubMed Google Scholar
Xiao, W. et al. PHF6 and DNMT3A mutations are enriched in distinct subgroups of mixed phenotype acute leukemia with T-lineage differentiation. Blood Adv. 2, 3526–3539 (2018).
Article CAS PubMed PubMed Central Google Scholar
Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 12, e1004873 (2016).
Article ADS PubMed PubMed Central Google Scholar
Soong, D. et al. CNV Radar: an improved method for somatic copy number alteration characterization in oncology. BMC Bioinformatics. 21, 98 (2020).
Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).
Article CAS PubMed Google Scholar
Cingolani, P. et al. Using drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 3, 35 (2012).
Article PubMed PubMed Central Google Scholar
Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 1–16 (2019).
Article CAS Google Scholar
Nicorici, D. et al. FusionCatcher - a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Cold Spring Harbor Laboratory; 2014.
Uhrig, S. et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460 (2021).
Article PubMed PubMed Central Google Scholar
Melsted P. et al. Fusion detection and quantification by pseudoalignment. bioRxiv https://www.biorxiv.org/content/10.1101/166322v1 (2017).
Davidson, N. M., Majewski, I. J. & Oshlack, A. JAFFA: high sensitivity transcriptome-focused fusion gene detection. Genome Med. 7, 1–12 (2015).
Article CAS Google Scholar
Ma, C., Shao, M. & Kingsford, C. SQUID: transcriptomic structural variation detection from RNA-seq. Genome Biol. 19, 1–16 (2018).
Article CAS Google Scholar
Ewels, P. A. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 38, 276–278 (2020).
Article CAS PubMed Google Scholar
Chen, X. et al. Fusion gene map of acute leukemia revealed by transcriptome sequencing of a consecutive cohort of 1000 cases in a single center. Blood Cancer J. 11, 1–10 (2021).
Article Google Scholar
Wang, Y., Wu, N., Liu, D. & Jin, Y. Recurrent fusion genes in leukemia: an attractive target for diagnosis and treatment. Curr. Genomics. 18, 378–384 (2017).
Huret, J. L. et al. Atlas of genetics and cytogenetics in oncology and haematology in 2013. Nucleic Acids Res. 41, D920-4 (2013).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Article CAS PubMed PubMed Central Google Scholar
Riemke, P. et al. Myeloid leukemia with transdifferentiation plasticity developing from T‐cell progenitors. EMBO J. 35, 2399–2416 (2016).
Article CAS PubMed PubMed Central Google Scholar
Barbie, D. A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
Article CAS PubMed PubMed Central Google Scholar
Laurenti, E. et al. The transcriptional architecture of early human hematopoiesis identifies multilevel control of lymphoid commitment. Nat. Immunol. 14, 756–763 (2013).
Article CAS PubMed PubMed Central Google Scholar
Xie, X. et al. Single-cell transcriptomic landscape of human blood cells. Natl. Sci. Rev. 8, nwaa180 (2021).
Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database. 2017, bax028 (2017).
Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods. 14, 975–978 (2017).
Sandelin, A., Alkema, W., Engström, P., Wasserman, W. W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004).
Article CAS PubMed PubMed Central Google Scholar
Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069.e23 (2021).
Article CAS PubMed Google Scholar
Janssens, J. et al. Decoding gene regulation in the fly brain. Nature 601, 630–636 (2022).
Article ADS CAS PubMed Google Scholar
Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403–411 (2021).
Article CAS PubMed PubMed Central Google Scholar
Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548.e16 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bentsen, M. et al. ATAC-seq footprinting unravels kinetics of transcription factor binding during zygotic genome activation. Nat. Commun. 11, 1–11 (2020).
Article Google Scholar
Mulet-Lazaro, R. et al. Allele-specific expression of GATA2 due to epigenetic dysregulation in CEBPA double-mutant AML. Blood 138, 160–177 (2021).
Article CAS PubMed PubMed Central Google Scholar
Speir, M. L. et al. The UCSC Genome Browser database: 2016 update. Nucleic Acids Res. 44, D717–D725 (2016).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We would like to thank the colleagues from the bone marrow transplantation group and the molecular diagnostics laboratory of the department of Hematology (E. Braakman, P.J.M. Valk) as well as collaborators from the department of Clinical Genetics (H.B. Beverloo) at the Erasmus University Medical Center for storage of samples, molecular and cytogenetic analysis of the leukemia cells. We are also grateful to Remco Hoogenboezem for assistance in developing bioinformatics tools and to the rest of our colleagues in the department of Hematology for input during work discussions. Furthermore, we thank Roberto Avellino for critically reading the manuscript. This work was funded by grants from the following organizations: (1) Dutch Cancer Foundation: EMCR 2015-7935 (R.D.), EMCR 2015-7550 (B.W.); (2) Leukemia & Lymphoma Society (LLS), Special Fellowship in Clinical Research: Grant # 4317-16 (B.W.); (3) Deutsche Forschungsgemeinschaft SPP2202 Priority Program: GE 202/1-1 (C.G. and J.M.V); (4) Medical Research Council: UK MC_UP_1605/10 (J.M.V.); (5) the Academy of Medical Sciences and the Department of Business, Energy and Industrial Strategy: APR3\1017 (J.M.V.); (6) German Cancer Aid (M.R.); (7) Wilhelm Sander Stiftung (M.R.); (8) Krebshilfe Antrag 111602 (M.R.).

Author information

These authors contributed equally: Bas J. Wouters, Ruud Delwel, Michael Rehli, Claudia Gebhard.

Authors and Affiliations

Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands
Roger Mulet-Lazaro, Stanley van Herk, Aniko Sijs-Szabo, Claudia Erpelinck-Verschueren, Anita Rijneveld, Bas J. Wouters & Ruud Delwel
Oncode Institute, Utrecht, the Netherlands
Roger Mulet-Lazaro, Stanley van Herk, Claudia Erpelinck-Verschueren, Bas J. Wouters & Ruud Delwel
Department of Internal Medicine III, University Hospital Regensburg, Regensburg, Germany
Margit Nuetzel, Lucia Schwarzfischer-Pfeilschifter, Hanna Stanewsky, Ute Ackermann, Dagmar Glatz, Johanna Raithel, Alexander Fischer, Sandra Pohl, Michael Rehli & Claudia Gebhard
Max Planck Institute for Molecular Biomedicine, Muenster, Germany
Noelia Díaz & Juan M. Vaquerizas
Renewable Marine Resources Department, Institute of Marine Sciences (ICM-CSIC), Barcelona, Spain
Noelia Díaz
Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Katherine Kelly & Christoph Plass
Department of Conservative Dentistry and Periodontology, University Hospital Regensburg, Regensburg, Germany
Sandra Pohl
MRC London Institute of Medical Sciences, London, United Kingdom
Juan M. Vaquerizas
Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital 8 Campus, London, United Kingdom
Juan M. Vaquerizas
Medizinische Klinik und Poliklinik I, Universitätsklinikum Carl Gustav Carus, Dresden, Germany
Christian Thiede
Leibniz Institute for Immunotherapy (LIT), Regensburg, Germany
Michael Rehli & Claudia Gebhard

Authors

Roger Mulet-Lazaro
View author publications
You can also search for this author in PubMed Google Scholar
Stanley van Herk
View author publications
You can also search for this author in PubMed Google Scholar
Margit Nuetzel
View author publications
You can also search for this author in PubMed Google Scholar
Aniko Sijs-Szabo
View author publications
You can also search for this author in PubMed Google Scholar
Noelia Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Erpelinck-Verschueren
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Schwarzfischer-Pfeilschifter
View author publications
You can also search for this author in PubMed Google Scholar
Hanna Stanewsky
View author publications
You can also search for this author in PubMed Google Scholar
Ute Ackermann
View author publications
You can also search for this author in PubMed Google Scholar
Dagmar Glatz
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Raithel
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Pohl
View author publications
You can also search for this author in PubMed Google Scholar
Anita Rijneveld
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Vaquerizas
View author publications
You can also search for this author in PubMed Google Scholar
Christian Thiede
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Plass
View author publications
You can also search for this author in PubMed Google Scholar
Bas J. Wouters
View author publications
You can also search for this author in PubMed Google Scholar
Ruud Delwel
View author publications
You can also search for this author in PubMed Google Scholar
Michael Rehli
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Gebhard
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The study was designed and written by R.M.-L., B.W, C.G., R.D., and M.R. Wet lab experiments were performed by S. van H., M.N., C.G., A.S., N.D., C.E-V., L.S.-P., H.S, D.G., U.A., S.P, J.R., and A.F. Bioinformatical analyses were conducted by R.M-L., K.K., J.M.V., C.G., and M.R. Patient samples and data were provided by A.S., A.R., C.T., C.P., C.G., R.D., and B.W.

Corresponding authors

Correspondence to Bas J. Wouters, Ruud Delwel, Michael Rehli or Claudia Gebhard.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1-41

Supplementary Code 1

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mulet-Lazaro, R., van Herk, S., Nuetzel, M. et al. Epigenetic alterations affecting hematopoietic regulatory networks as drivers of mixed myeloid/lymphoid leukemia. Nat Commun 15, 5693 (2024). https://doi.org/10.1038/s41467-024-49811-y

Download citation

Received: 11 August 2023
Accepted: 19 June 2024
Published: 07 July 2024
DOI: https://doi.org/10.1038/s41467-024-49811-y
Springer Nature Limited

Associated content

Cancer epigenetics

Collection 30 January 2024
Cancer

Focus 26 January 2021

Epigenetic alterations affecting hematopoietic regulatory networks as drivers of mixed myeloid/lymphoid leukemia

Abstract

Similar content being viewed by others

Introduction

Results

Global DNA methylation identifies a distinct group of hypermethylated leukemias

The epigenetic landscape of CIMP leukemias reveals an intermediate state between T-ALL and AML

CIMP leukemias are genetically heterogeneous

Transcriptional signatures suggest that the cell of origin is an early progenitor with a possible lymphoid bias

Promoter methylation changes correlate with silencing of critical hematopoietic factors in CIMP leukemias

Hematopoietic TF networks are rewired in CIMP leukemias

Genome-wide hypermethylation leads to widespread loss of CTCF binding

Loss of CTCF binding is accompanied by changes in 3D organization

Discussion

Methods

Ethical statement

Patient material and data generation

Data generation and processing

Methyl-CpG immunoprecipitation sequencing (MCIp-seq)

RNA sequencing (RNA-seq)

Whole exome sequencing

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq)

Assay for transposase-accessible chromatin using sequencing (ATAC-seq)

Hi-C

Bioinformatics and data visualization

Identification of functional regions

Quantification and differential analysis of peak-based data

Clustering of transcriptional and epigenomics data

Analysis of MethylationEPIC array data

Integration of MCIP-seq and RNA-seq data

Identification and analysis of small genetic variants

Copy number alteration detection

Fusion gene detection

Gene set enrichment analysis and identification of hematopoietic signatures

Integration of ATAC-seq and RNA-seq data

Analysis of motif activity in chromatin accessibility data

Integration of Hi-C data with gene expression and CTCF binding

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation