Background

Congenital heart disease (CHD) is the most common form of birth defect and affects approximately 1% of live-born children [1]. It is also among the top five causes of death in children younger than 1 year [2]. Despite improved prenatal care and increased awareness of risk factors, the global incidence of CHD has steadily increased over the last five decades [3]. With advancements in medical and surgical interventions, most CHD patients can now survive into adulthood, resulting in a doubling of the prevalence rate in the aged population between 1990 and 2017 [4]. The inability to control the incidence of CHD highlights the importance of etiological studies.

Currently, about 20–30% of CHD cases can be genetically diagnosed with known CHD-causing genetic changes. These include 8–10% gross chromosomal anomalies/aneuploidy, 3–25% copy number variations, and 3–5% single gene variants [5]. Positive genetic diagnoses are more likely to be achieved in syndromic cases at either the chromosomal [6] or gene levels [7], although syndromic cases account for only 25% of total CHD cases [8]. Large genetic CHD cohort studies with whole exome sequencing (WES) from the Pediatric Cardiac Genomics Consortium (PCGC) suggest that overall, 8% of cases can be attributed to de novo autosomal dominant mutations, with syndromic cases more likely to be explained (up to 28% of syndromic vs 3% of isolated CHD cases explainable); inherited rare variants are implicated in only 1.8% of cases [9, 10]. Thus far, the etiology of the vast majority of CHD cases, especially sporadic isolated cases, is still poorly understood.

Although classic Mendelian inheritance patterns have been identified in some familial clusters, the overall 2–6% sibling or offspring recurrence risk of isolated CHD suggests that the majority of CHD cases are multifactorial in origin, ranging from multiple genetic alterations to gene-environmental interactions [5]. Gifford et al. provided a compelling example of the oligogenic origin of congenital heart disease (CHD) by discovering that the combined inheritance of heterozygous mutations in three genes (MKL2, MYH7, and NKX2-5) resulted in left ventricular noncompaction (LVNC) in both humans and mice [11]. Priest et al. adopted a more systemic approach in their study of atrioventricular septal defects (AVSD). They analyzed the total rare variants with classical inheritance patterns (de novo, homozygous, compound heterozygous) from 59 AVSD trios using protein interaction network analysis. Their findings identified protein interaction networks, particularly a pair of interacting collagen genes (COL2A1, COL9A1), enriched in the AVSD trios, providing support for oligogenic inheritance [12]. Although numerous variant-level genome-wide association studies have been conducted to identify genetic risk loci for CHD, there have been no reports on gene-level genome-wide assessments of rare inherited heterozygous mutations contributing to CHD.

ENU (N-ethyl-N-nitrosourea) is an alkylating agent commonly used to induce mutations in mice to mimic spontaneous mutations in humans for more than 30 years [13]. A large ENU-based recessive forward genetic screening study of CHD in mice identified double recessive mutations in Sap130 and Pcdha9, previously not associated with CHD, as a novel digenic origin of HLHS [14]. The same genetic screen also revealed 91 recessive CHD mutations in 61 genes, half of which were related to cilia formation, with laterality defect syndromes being the most common manifestations [15]. This finding is consistent with that of a human CHD WES study in which among all major CHD subgroups, only laterality defects were significantly enriched for damaging recessive genes, particularly cilia-related genes [10]. These findings suggest that recessive damaging mutations may underlie a distinct minority class of CHD, largely in the syndromic form of laterality defects.

Since almost all known genes related to isolated CHD and more than half of the known genes related to syndromic CHD exhibit an autosomal dominant pattern [7], a reduced dosage or gene expression level, rather than complete loss of gene products is more likely the mechanism underlying most CHD cases. Individually, these mutations may not cause CHD, but their presence in combination can lead to the development of CHD. To identify such disease-contributing heterozygous mutations, we employed a large-scale ENU-based forward dominant screen in mice to explore potential novel genetic risk factors of CHD. This study identified a large number of mutations in genes regulating early embryonic heart contractility as a novel mechanism contributing to CHD.

Methods

Mouse breeding and ENU mutagenesis

All animal experiments were performed in accordance with protocols approved by the Institutional Animal Care and Use Committee of Westlake University (approval 21-005-SHJ). Mice were maintained in a 12-h light/dark cycle and provided ad libitum access to food and water. ENU mutagenesis was performed as previously described [16]. Briefly, sexually mature (8 weeks) C57BL6/J males were administered an intraperitoneal injection with 90 mg/kg ENU (N-ethyl-N-nitrosourea, Sigma-Aldrich) once a week for two consecutive weeks. In total, 200 G0 mice were injected, and after a recovery period of 15 weeks to regain fertility, 88 mutagenized G0 males were mated with wild-type C57BL6/J female mice. At E18.5, pregnant females were sacrificed by cervical dislocation. Their G1 fetuses were subsequently removed via cesarean section. Immediately thereafter, the fetuses were euthanized by decapitation, and their hearts were excised for imaging analysis. ENU-treated males exhibiting signs of morbidity and loss of reproductive function were euthanized with carbon dioxide [17].

Lightsheet fluorescence microscopy (LFM) and cardiac phenotyping

Fetal hearts were imaged using the Zeiss Lightsheet Z.1 microscope. Embryos were harvested at E18.5, and hearts were dissected in phosphate-buffered saline (PBS), fixed overnight in a mixed solution of 10% neutral buffered formalin and 2.5% glutaraldehyde, then rinsed twice in PBS, and dehydrated in 50%, 75%, and 100% ethanol for 30 min each at room temperature (RT). Samples were transferred into a specially designed glass tube containing 100 μL of BABB solution (1:2 benzyl alcohol: benzyl benzoate) for 30 min to clear the sample. The glass tube was mounted into the sample chamber filled with 85% glycerol (RI~1.45). Whole Hearts were scanned for tissue autofluorescence using a 561-nm laser line with detection optics 5×/0.16. 3D reconstruction of the image stacks and morphological analyses were performed with Imaris 9.3 software. Heart morphology was assessed in 3D mode independently by two trained personnel.

Mouse WES, WGS, and data processing

One hundred seventy CHD G1 fetuses and 52 normal G1 fetuses were whole exome sequenced at Novogene. Genomic DNA was captured using Agilent SureSelect Mouse All Exon V1 and sequenced using the Illumina Novaseq 6000 platform, with a minimum average of 100× target sequence coverage. Additionally, 550 CHD G1 and 559 normal G1 fetuses were sequenced at BGI using the MGI DNBSEQ-T series platform, with a minimum average of 60× whole genome sequence coverage. The reads were aligned to the C57BL/6J mouse reference genome (mm10) using BWA v0.7.17 [18]. Duplicate reads were removed with samblaster v0.1.26 [19], and the data were sorted with sambamba v0.8.0 [20]. For the whole genome sequencing samples, the exon region BAM file was extracted using the sambamba application slice parameter. Local realignments were generated, and base quality scores were recalibrated using GATK 4.2.0.0 [21] following the GATK Best Practices. All samples were called together with platypus/0.8.1 [22] to obtain a single vcf file, and variants were annotated with ANNOVAR [23] and the SIFT-4G [24] database. To further control the sequence quality, variants with genotype score < 80 or allele balance < 0.2 (for heterozygous genotypes only) were filtered out.

Kinship analysis and variants filtering

Before conducting the kinship analysis, variants that occurred four times or more in all 1,331 mouse fetuses were removed to eliminate potential interference from background noise. This step aimed to focus on the variants most likely originating from ENU mutagenesis. The remaining variants were then used to quantify the inbreeding coefficient (IC) between pairs of samples in the GCTA v1.949 software [25]. An IC cutoff of 0.05 was applied to determine the relatedness between pairs of samples. Core families were identified, consisting of pairs of samples with an IC greater than 0.05. These core families were further condensed by merging families with any overlapping samples, resulting in the identification of 152 large families and 983 singletons out of a total of 1331 samples. To ensure diversity within the analysis, only one randomly selected sample from each large family was included. This process established a control group consisting of 532 normal fetuses and a case group consisting of 603 fetuses with malformed hearts. To focus specifically on ENU-induced variants, only variants that occurred once in the total 1135 samples were retained for the final analysis. This additional filtering step resulted in the removal of 5 samples. Consequently, the control group consisted of 532 samples, and the case group consisted of 598 samples. The variant filtering process is depicted in Additional file 1: Fig. S1.

Human WES data processing

For human CHD samples, raw sequencing files (.sra) were converted into FASTQ files, and the quality of the sequencing reads was assessed using fastp [26]. The reads were aligned to the human genome reference sequence (CRGh38/hg38) using Burrows-Wheeler Aligner (BWA) v0.7.17 [18], in particular the BWA-MEM algorithm. Duplicate reads were marked and removed after alignment using samblaster v0.1.26 [18], and BAM files were sorted using sambamba v0.8.0 [20]. The control WES files were downloaded from the 1000 Genomes Project [27] in CRAM format and converted to BAM format using samtools 1.14 [28]. After all BAM files were split by chromosomes using sambamba application according to The Exome-Agilent-V6.bed file, the GATK Best Practices workflows (4.2.0.0) were used to apply indel realignment and base quality recalibration [21, 29]. Single nucleotide variants and small indels were called with GATK HaplotypeCaller using “-ERC GVCF” parameter. Further processed using GenotypeGVCFs parameter to merge each chromosome of all samples. All mutations were annotated using ANNOVAR [23], dbSNP (v150), 1000 Genomes (August 2015),dbNSFP (41a), gnomAD (v3), and AlphaMissense [30].

Statistical analysis

All statistical analyses were conducted using Python 3.8.8.

Global variant burden test based on an expected mutational model

The expected mutation model was generated based on a hypothetic mutation model where each base of the whole exome was subject to an equal chance of mutation, adjusted by the ENU mutational bias. Briefly, GTF files for the main gene transcripts of the GRCm38/mm10 genome were obtained from the UCSC genome browser. In cases of existence of multiple transcript isoforms, only the longest transcripts were taken into account. Genes of olfactory receptor family, vomeronasal receptor family, KRTAP family, taste receptors Tas1r and Tas2r families, and Ttn gene were excluded from the analysis due to their hyperpolymorphic nature. We then created a mutation simulation dataset containing each of the three possible single nucleotide substitutions for each base in the exome (Additional file 2: Table S3). The occurrence of each simulated nucleotide change was adjusted by multiplying an ENU metagenesis bias factor as determined for each possible substitution (Additional file 1: Fig. S2b). The mutation simulation dataset was then annotated by ANNOVAR for variant classification (synonymous, missense, and LOF including nonsense, spicing, frameshift, start loss, and stop loss) and pathogenicity prediction based on the SIFT-4G score. The damaging missense (D-Mis) variants were determined as SIFT-4G score < 0.05. The total number of variants for each variant class (synonymous, LOF, D-Mis) was summed. The expected number of frameshift indels was estimated by multiplying the total number of simulated variants by the observed proportion of frameshift indels out of the total number of observed variants in the final control and case group included in the analysis which was determined to be 0.2%. The simulated number of frameshift indels was then added to the LOF class. In-frame indels were not considered in this analysis. The following formula was used to estimate the expected probability of each variant class per mouse (Pvc):

$$\text{Expected Pvc }= {\overline{\text{N}}}_{obs}\text{ x }\frac{\text{Nvc }}{\text{Nt}}$$

where \({\overline{\text{N}}}_{obs}\) is the average number of variants observed in each mouse (which is determined to be 59 in this study); Nvc is the total number of variants in each variant class of the simulation dataset; Nt is the number of total variants of the whole simulation dataset.

Poisson statistics was used to test for an excess of mutations over expectation, from the expected probability of each variant class Pvc, the total number of mice in control or case group, and the observed number of the variant class within the group.

Global variant burden test based on the control variant distribution

The total number of variants in each variant class observed in controls was summed (Nvc). The expected probability of each variant class per mouse (Pvc) was derived by dividing the Nvc by the number of control mice. Burden test was conducted using Poisson statistics as described above to test for an excess of mutation in cases over controls.

Gene burden test based on the expected mutational model

From the mutation simulation dataset, the total number of variants for each variant class was summed for each gene (Additional file 2: Table S3). The following formula was used to estimate the expected probability of each variant class of each gene per mouse (Pvcg):

$$\text{Expected Pvcg }= {\overline{\text{N}}}_{obs}\text{ x }\frac{\text{Nvcg }}{\text{Nt}}$$

where \({\overline{\text{N}}}_{obs}\) is the average number of variants observed in each mouse (which is determined to be 59 in this study); Nvcg is the total number of variants of each gene in each variant class of the simulation dataset; Nt is the number of total variants of the whole simulation dataset.

To increase statistical power and minimize the risk of false positive discoveries, we removed genes from the analysis if the total number of mutation events, considering both controls and cases combined, was equal to or less than 5. A one-sided binomial test was then used to test for an excess of mutations over expectation for each gene, from expected probability of each variant class of each gene per mouse (Pvcg), the total number of mice in the control or case group, and the observed number of each variant classes of each gene within the group. After the generation of p-value for all genes, Storey’s q-value procedure [31] was used to control the false discovery rate (FDR) under 0.05.

The expected number of mutations for each gene is defined as Pvcg multiplied by the total number of mice in the control or case group. Enrichment score is defined as the ratio of the observed number of mutations and the expected number of mutations.

Gene burden test based on human case-control comparison

After co-calling of variants from all control and case exomes, a filter was applied to obtain variants that were both rare (MAF < 0.001) and damaging (annotated as LOF, or predicted to be pathogenic by AlphaMissense [30]). For each rare damaging variant, the total number of samples with identifiable genotype Ns (genotype score ≥ 20 and total reads ≥ 15; if the genotype is called as heterozygous, the variant reads ≥ 5) and the total number of samples with rare damaging variants from these identifiable genotypes Nv were determined within each sample group. All rare damaging variants were summed to the gene level within each sample group to give rise to the total number of samples with rare damaging variant per gene (Nvg). The median of the Ns of each gene (Nms) represented the number of samples that were examined for the presence of Nvg. Based on Nvg and Nms, one-sided Fisher’s exact statistics was then used to test for an excess of mutation for each gene between the case and control group.

Gene sets used for mouse variant enrichment analysis

The known CHD genes set was adapted from the Knowledgebase for Congenital Heart Disease-related Genes and Clinical Manifestations (http://chddb.fwgenetics.org/) [32], which contains 1124 genes manually curated from multi-cohort analyses for CHD, among which 1044 mouse ortholog could be mapped. SysCilia genes [33], cilia genes [34], chromatin-modifying genes [34], high heart expression genes (HHE) [9, 35], and low heart expression genes (LHE) [9, 35] lists were adapted from previous reports. HHE were the top 25% of genes expressed in E14.5 mouse hearts and LHE were the bottom 25% of genes expressed in E14.5 mouse hearts.

Gene set enrichment analysis

Statistically significant gene sets were input into Metascape (https://metascape.org/) [36] to obtain enriched GO biological process and KEGG pathway with q-values calculated using the Benjamini-Hochberg procedure [37]. For gene set enrichment against MGI mammalian phenotype database, two files (All Genotypes and Mammalian Phenotype Annotations and Mammalian Phenotype Vocabulary in OBO v1.2) were obtained from Mouse Genome Informatics (https://www.informatics.jax.org/) [38] to compile a genetype-phenotype association file. Hypergeometric distribution test was then used to perform the term enrichment analysis.

Permutation test and principal component analysis

We randomly selected 148 genes from the pool of ENU-induced mutated genes that have served as the basis for our case-enriched geneset. After conducting 10,000 permutations, we subjected these random gene sets to enrichment analysis against the MGI mammalian phenotype database, yielding p-values for each phenotype term. Using the p-values < 0.01 as a cutoff, we assigned the value of 1 to the significant term and 0 to the insignificant term for each geneset, and then performed principal component analysis (PCA) across 10,001 datasets (including our case-enriched gene set and 10,000 permutations) based on these assigned codes.

Results

ENU mutagenesis resulted in heart defects in G1

The process involved mating ENU-treated G0 males with wild-type females, resulting in the generation of G1 fetuses harboring multiple heterozygous de novo mutations derived from the mutagenized spermatogonial stem cells of the G0 males. At embryonic day 18.5 (E18.5), a total of 10,285 G1 fetal hearts were harvested and phenotyped using lightsheet fluorescence microscopy (LFM). The LFM technology enabled us to perform rapid scanning of the entire fetal heart in just 20 s per heart, achieving a three-dimensional resolution of 2.29 × 2.29 × 7.16 μm. Leveraging this high-throughput and high-resolution imaging technique, we successfully identified 1109 G1 fetuses with diverse heart defects, leading to an overall defect rate of 10.8% (Fig. 1 and Table 1). The most frequently observed defect types were bicuspid aortic valve (BAV) and muscular ventricular septal defect (mVSD), each accounting for 30% of the total defects. Perimembranous ventricular septal defect (pmVSD) was the next most common defect and was observed in 11% of the fetuses. Outflow tract defects, including double outlet right ventricle (DORV), persistent truncus arteriosus (PTA), and transposition of the great arteries (TGA), accounted for 5% of the total defects. Atrial septal defect (ASD), in particular, secundum ASD which accounts for 70% of all ASD, is a common heart malformation in humans. However, the interatrial communication is normally present during fetal life, and consequently, the prenatal diagnosis of secundum ASD is rarely possible. We therefore did not identify any secundum ASD in this prenatal screen. Three primum ASDs were identified. Only a small percentage (2.4%) of heart defects were accompanied by visible external defects such as microcephaly and cleft palate (Additional file 1: Table S1). The distribution pattern of heart defect subtypes observed in this screen closely mirrored that of congenital heart defects observed in human patients [39, 40].

Fig. 1
figure 1

Representative heart defects reconstructed by lightsheet fluorescence microscopy

Table 1 Heart defect rates in the G1 fetuses

Characteristics of ENU-induced de novo mutations

We performed WES/WGS on a randomly selected subset of fetuses. We included 720 fetuses with heart defects but without visible external defects (case group) and 611 litter-matched fetuses with normal hearts (control group) (Additional file 1: Fig. S1). The case and control groups exhibited similar sequencing metrics, ensuring a valid comparison (Additional file 1: Table S2). To distinguish rare ENU-induced variants from the background variations, variants that occurred four times or more in all 1331 samples were removed. Since there is a small chance of a clonal relationship among offspring, which would interfere with the burden analysis, a kinship analysis using an inbreeding coefficient cutoff of 0.05 was applied to remove related samples. We further filtered for the variants that occurred only once in all 1135 samples to ensure that we only analyzed purely ENU-induced de novo mutations. Finally, 598 cases and 532 controls were retained, each presumably derived from an independently edited single spermatogonial stem cell.

Regarding the variant classification, we found that nonsynonymous single nucleotide variants (SNVs) were the most prevalent, accounting for 69% of all variants. Synonymous SNVs accounted for 24% of the variants. The remaining 7% were predominantly loss of function variants such as splicing, stop gain, and indels (Additional file 1: Fig. S2a). We observed similar distributions of variants across all chromosomes in both the case and control groups, with the adenine and thymine being the predominant edited bases (Additional file 1: Fig. S2b and S2c). These findings suggest that there were no systemic differences in variant calling or annotation between the cases and controls. A total of 15807 coding genes were affected by ENU at least once in the 1130 case and control samples. This finding represents a coverage of approximately 76% of the whole mouse exome (Additional file 1: Fig. S2c). On average, the G1 progeny exhibited 59 (66,928/1130) exonic ENU-induced de novo variants on average per fetus. Notably, this number is 53 times greater than the observed de novo mutation rate in humans, as reported in previous studies [9]. This increased mutation load provides an opportunity to explore and identify risk genes associated with heart defects.

Increased mutation burden in mice with CHD

To assess the difference in mutation burden between the case and control groups, we developed an expected mutational model under the ENU treatment. This model involved simulating all possible nucleotide changes to the whole mouse protein coding sequence and deriving the expected frequency of each possible variant based on several factors, including the average of 59 exonic ENU-induced de novo variants per sample, transcript length, and ENU mutation bias (Additional file 2: Table S3).

All the variants were classified into distinct classes, such as synonymous, missense, and loss-of-function (LoF) variants, which included stop gain, stop loss, start loss, splicing, and frameshift variants. Damaging missense mutations (D-Mis) were defined as missense mutations predicted to be damaging by SIFT-4G, the only prediction algorithm available for mice. The expected and observed numbers of variants in each variant class were subsequently compared for the control and case groups using a one-tailed Poisson test [9] (Table 2). As expected, the mutation rates in all variant classes were accurately predicted in the control group. However, we observed a 1.13-fold excess of LoF mutations in the case group across all the genes, indicating a greater burden of LoF mutations in the cases compared to the controls (p = 3.9 × 10−10). Furthermore, damaging mutations, including LoF and D-Mis variants in genes known to be related to CHD [32], and LoF mutations in genes highly expressed in the developing heart [9, 35] were markedly enriched in CHD cases but not in controls. In contrast, genes with low expression in the heart [9, 35] were not enriched with any genetic variants in either the case or control groups. Interestingly, although previous studies have implicated recessive mutations in cilia genes in both mouse [15] and human CHD [34], we did not observe significant enrichment of heterozygous mutations of cilia genes in this dominant screen. However, we found an increased incidence of LoF mutations in chromatin genes [34] in the cases but not in the controls (Additional file 1: Table S4).

Table 2 Variant class enrichment by expectation analysis

The identical genetic backgrounds of the case-control mice and uniform sequence coverage allowed us to directly compare the mutation rates between the cases and controls using one-tailed Poisson tests [35] (Additional file 1: Table S5). In line with the previous analysis based on the expectation model, the cases had a significant excess of LoF mutations across all genes and genes with high heart expression. Additionally, CHD-related genes were enriched in cases for the damaging variants (LoF and D-Mis). These findings indicate that simultaneously disrupting single alleles of multiple genes in the germ line can lead to heart defects in the offspring.

Heart contraction genes enriched in mice with CHD

For each gene, we considered all qualifying variants within the specified variant classes and summed their allele counts in the case group, control group, and expected mutational model separately. We then performed a one-sided binomial test to determine whether there was a significant deviation in the frequency of damaging mutations in each gene from the expected distribution. After correcting for multiple testing (FDR<0.05, Storey’s q-value procedure [31]), a total of 148 and 25 genes were significantly enriched in the cases and controls, respectively (Additional file 2: Table S6 and S7). Among all these case-enriched genes, Notch1 [41, 42], Fbn2 [43], Prrl2 [44], and Rere [45] are established CHD genes with an autosomal dominant inheritance pattern in humans. No genes enriched in the control mice are known to be associated with CHD with an autosomal dominant inheritance pattern.

To gain further insights into the biological mechanisms underlying CHD, we conducted GO term and KFGG pathway enrichment analyses for the 148 genes overrepresented in mice with heart defects (Fig. 2 and Additional file 2: Table S8). Genes involved in regulating heart contraction, such as calcium ion transmembrane transport, action potential, and muscle structure development, were among the most enriched pathways. Notable examples include Ryr2 and Ryr3, which encode ryanodine receptors involved in excitation-contraction coupling, Atp2a2 and Atp2b2, which encode subunits of ATP-driven Ca2+ ion pumps critical for cardiac relaxation, and Cacna1e and Cacna1s, which encode calcium voltage-gated channel subunits required for calcium entry. Abcc9 and Kcnma1, which encode subunits of potassium channels critical for regulating membrane potential, were also enriched.

Fig. 2
figure 2

GO term enrichment analysis for genes enriched in mice with heart defects. Pathways with q-value < 0.05 (Benjamini-Hochberg method) were considered significant

Interestingly, many genes involved in neuronal function and development were also highly overrepresented in the cases. These included genes involved in neurite growth and axon guidance (Slit2, Chl1, Celsr2), neuronal migration (Wdr47, Kif26a), and neurotransmitter secretion (Stxbp5, Stxbp5l). In contrast, our analysis did not identify any significant enrichment of pathways or processes for genes with increased mutations in the control group. These findings highlight the importance of genes involved in heart contractility and potential neuronal regulation of early cardiac functions in the development of CHD.

To further characterize the genes overrepresented in mice with heart defects, we submitted the case-enriched 148 genes to the MGI mammalian phenotype term enrichment analysis. The results confirm a significant enrichment for abnormal channel response, abnormal cardiovascular morphology, impaired muscle contractility, and abnormal neurological response (Additional file 1: Fig. S3a). Furthermore, to ascertain that the observed association between the case-enriched geneset and these phenotypes is statistically robust and not merely a reflection of the overall mutation characteristics in our screen, we performed a permutation test from the pool of ENU-induced mutations from which the 148 case-enriched genes were derived. The analysis revealed that our case-enriched geneset was significantly distinct from the random 10,000 permutation datasets (p < 2.2e−16, Hotelling’s T-squared test [46]) (Additional file 1: Fig. S3b). Lastly, to further substantiate that the case-enriched geneset is specifically associated with cardiovascular system and nervous system phenotypes, we plotted a density map of all p-values for these two terms across the 10,001 datasets (Additional file 1: Fig. S3c and S3d). The findings demonstrate that the p-value for our case-enriched gene set falls within the top 0.01% of all p-values for the nervous system phenotype, and top 0.17% for the cardiovascular system phenotype, when ranked from smallest to largest. Taken together, the association of case-enriched genes with cardiac contraction and neuronal function and development is specific.

Heart contraction-related genes enriched in human CHD

We obtained WES/WGS data from 3406 CHD probands from the US National Heart, Lung, and Blood Institute (NHLBI) Pediatric Cardiac Genomics Consortium (PCGC). After excluding aortic arch patterning defects that were not found in our mouse screen, we obtained a sample size of 1457 probands with CHD. Out of these probands, 1333 also had WES or WGS data available for their parents. The defect types included VSD (15%), pulmonary stenosis (15%), TGA (12%), ASD (10%), Tetralogy of Fallot (9%), aortic stenosis (9%), BAV (4%), and others. WES of 2675 control subjects were obtained from the 1000 Genomes Project [27]. Variants were co-called from all bam files of cases and controls by GATK as described in the Methods.

In the gene burden analysis, we summed the number of all qualifying variants of each gene in each variant class for the 1457 cases and 2675 controls, respectively. Damaging variants were defined as LOF or predicted to be pathogenic by AlphaMissense [30]. We used a one-sided Fisher’s exact test to identify genes with significant differences in the frequency of rare damaging mutation (Lof+D-Mis, MAF < 0.001) between the cases and controls. Due to the small sample size and small number of mutation events in most genes being analyzed, which resulted in generally modest p-values in Fisher’s exact test, we have chosen not to apply multiple testing corrections and directly used p-value < 0.05 for the significance test in this circumstance. Since this approach may increase the potential for false positive findings, we have conducted a comparison of the mutation tolerability of the 373 genes identified to be overrepresented in the cases and 432 genes overrepresented in the controls (Additional file 3: Table S9 and S10). The comparison revealed that the genes associated with CHD were functionally less tolerant to damaging mutations, indicating their potential role in the development of CHD. As shown in Fig. 3a from the gnomAD database of constraint scores, we found that the observed/expected scores of LoF variants of these case-enriched genes were statistically significantly lower than those of genes enriched in controls. Accordingly, the probability of loss-of-function intolerant scores (pLI) was higher in the case-enriched genes (Fig. 3b). Missense variants were also significantly depleted for genes enriched in the case compared to that in control-enriched genes (Fig. 3c, d). These results indicate that the case-enriched genes are less tolerant to damaging mutations than other genes, further supporting their potential role in CHD pathogenesis.

Fig. 3
figure 3

Pathway and process enrichment analysis for human CHD. a The ratio of the observed/expected (oe) number of loss-of-function variants in genes enriched in cases and controls. b pLI score for genes enriched in cases and controls. c The ratio of the observed/expected (oe) number of missense variants in genes enriched in cases and controls. d Missense Z score for genes enriched in cases and controls. e Pathway and process enrichment analysis for genes enriched in the cases. f Pathway and process enrichment analysis for genes enriched in the controls

Enrichment analysis of these genes revealed that cognition and nervous development were among the top enriched cellular processes (Fig. 3e, Additional file 3: Table S9 and Table S11). These included genes involved in axon growth and guidance (PLXND1, SEMA3B, SEMA3D, SEMA6A, ULK2), neurotrophic factors, and transcription factors critical for neuron differentiation (ATOH1, NOTCH1, LHX2, NTF4) and synaptogenesis and neurotransmission (SLITRK1, LRTM2, CLSTN2). Genes involved in muscle contraction and development, such as the calcium voltage-gated channel subunit CACNA1S and genes required for myofibril assembly (MYH11, TNNT1, NRAP, and OBSCN), were also enriched in cases. No genes related to heart contraction or nervous system development were enriched in the control group (Fig. 3f, Additional file 3: Table S10 and S12).

To explore potential causal genetic factors for individual CHD probands, we conducted a search for digenic gene sets characterized by the concurrent occurrence of mutations in the same probands or in the same mice exhibiting heart defects. The following criteria were applied to identify these digenic gene sets: (1) both genes in the pair must have rare damaging mutations, as previously defined, in at least one mouse and at least two human CHD probands from the 1333 trios; (2) the gene pair should not have concurrent rare damaging mutations in the control mice; (3) the gene pair should not have concurrent rare damaging mutations in either parent of the proband; (4) the gene pair should not have concurrent rare damaging mutations in the 2675 control subjects from the 1000 Genomes Project. Based on these criteria, 101 candidate digenic gene sets were identified for future validation. (Additional file 3: Table S13).

Discussion

The molecular genetics of CHD has long been a challenging puzzle to solve. Based on the total CHD recurrence rate of 5% and an incidence rate of 1% in the general population [5], a simple calculation suggests that genetic factors may contribute to approximately 75% of CHD cases. However, currently known genetic factors such as copy number variations, aneuploidy, de novo mutations, and transmitted variants only account for approximately 30% of CHD cases overall [47].

Genome-wide association studies (GWAS) have been conducted to identify common susceptibility loci for CHD. However, these GWAS studies often suffer from small cohort sizes, low reproducibility, and small effect sizes of identified loci [47, 48]. A recent meta-analysis by Yu et al [49] addressed some of these issues by conducting a large-scale analysis of 4597 cases and 50,745 control individuals from four CHD cohorts. Sixteen novel loci, including 12 rare noncoding variants with moderate or large effect sizes were identified. These loci were found to disrupt transcription factor binding sites involved in cardiac development or disrupt physical contact with key genes in cardiac development [49]. This study reproduced the well-known phenomenon of rare variants with large effect sizes and common variants with small effect sizes. While these discovered loci provide valuable insights into the molecular mechanisms of CHD and confer overall risk for CHD, the percentage of CHD cases substantially influenced by each specific variant is likely to be small. As an alternative to GWAS, gene-level burden testing has been developed to condense multiple rare inherited variants into individual genes before conducting association studies [50, 51]. However, to the best of our knowledge, there have been no reports of whole exome-wide gene burden testing in either animal genetic screens or human CHD studies.

In this study, a saturating ENU mutagenesis screen in mice led to the identification of a large number of genes that were previously unknown to be associated with CHD. Notably, all these variants were present in a heterozygous state in the affected animals, consistent with the oligogenic inheritance model. Go term and KEGG pathway enrichment analysis revealed that heart contraction and neuronal genes were the most significantly enriched in mice with CHD. This finding was consistent with subsequent findings from a human case-control gene burden test.

This study employed LFM as a valuable tool for screening heart defects in mice at a large scale. LFM provides high resolution and rapid imaging speeds (less than 20 s per heart), making it highly effective for 3D reconstructions, particularly when compared to more traditional methods such as micro-CT, MRI, and stacked histology images. This technique has enabled us to identify subtle heart defects, including BAV, which comprises a significant 40% of the detected anomalies. The Zeiss Z1 LFM model we utilized is optimized for imaging mouse hearts from E14 to postnatal day 7. Specimens outside this developmental window may not benefit from the same level of resolution at 5× objective or may not fit within the imaging chamber. Similar to other imaging techniques that focus on anatomical features, LFM does not directly provide insights into functional deficits. Conditions such as aortic or pulmonary stenosis require additional hemodynamic data for a more complete assessment. Defects like VSD or outflow tract (OFT) misalignment are more straightforward to identify, while assessing the extent of left ventricular non-compaction (LVNC) can be more subjective. In our study, we applied a non-compacted to compacted myocardium ratio greater than 4 as a criterion for diagnosing LVNC. This stringent criterion resulted in an incidence rate of 0.09% for LVNC in our sample which may be somewhat underestimated. Given the overall 10× incidence rate of heart defects observed in this study compared to human CHD rates, this incidence rate of LVNC is disproportionately lower than the 0.076% prevalence rate of LVNC in human newborns, as reported by Kock et al. [52].

In this screen, the average litter size at E18.5 was recorded at 7, a figure that aligns closely with the standard litter size of 6–8 pups observed in wild-type, untreated C57BL/6J mice [53]. Only 0.8% of fetuses were found nonviable at the time of harvest at E18.5. Given that all identified mutations are paternally inherited and present in a heterozygous state, embryonic lethality was infrequent. However, it is noteworthy that fetuses exhibiting severe cardiac malformations may succumb postnatally.

It is interesting to note that except a very small number of genes such as Notch1, the majority of these genes identified through the mouse screen and human CHD burden test are not traditionally considered classical monogenic CHD-causing genes based on family linkage analysis. Furthermore, most of these genes do not cause heart defects or severely impact adult heart function in mice when only one copy of the gene is deleted (Additional file 2: Table S6). One possible explanation for the involvement of these heart contraction genes in CHD pathogenesis is that they may exhibit haploinsufficiency, specifically during a critical window of heart development when the developing heart is most vulnerable to hemodynamic disruption. Doppler ultrasound imaging of early-stage mouse embryos has revealed that from the onset of heartbeat (E8.0) through E14.5, which is a critical period of cardiac morphogenesis, there is a progressive increase in heart rate, peak velocities, and cardiac output [54, 55]. This contractile change is accompanied by a steady increase in the expression of cardiac contractile proteins and subunits of various sarcolemma or sarcoplasmic reticulum ion pumps and channels [56], leading to changes in electrophysiology and calcium handling [57]. Based on these observations, it is possible that a decreased dosage of a specific contraction-related gene during a certain stage of heart development may transiently disrupt the balance between gradual myocyte maturation and the increasing metabolic demands, resulting in a temporary disruption of cardiac function, such as heart rate, rhythm, or contractility. This functional impairment might subside eventually after full maturation of the heart. This theory needs to be tested by knocking out one copy of individual candidate cardiac contraction genes and studying its impact on early cardiac function and their risk to CHD.

Another interesting finding is the association of neuronal genes with CHD. The intricate relationship between neural regulation and early cardiac function and morphogenesis is not yet fully elucidated. Innervation of the cardiac conduction system, primarily originating from neural crest cells of neuroectodermal lineage, follows intricate migratory pathways guided by a complex interplay of factors, including neurotrophic molecules, axon guidance proteins, differentiation signals, and survival cues [58]. Notably, many of the genes involved in these processes have been identified through our studies in both mouse models and human subjects. Anatomical studies in mouse embryos indicate that parasympathetic neurons expressing the vesicular acetylcholine transporter (VAChT) and sympathetic neurons expressing tyrosine hydroxylase (TH) are present near the venous pole and the dorsal meso-cardial connection, respectively, by E12.5[59]. These positions align closely with the developing atrioventricular and sinus nodes. However, the functionality of these early neural elements is uncertain, with conventional wisdom suggesting that functional cardiac innervation emerges later in fetal development, post-morphogenesis [60, 61]. A meticulous investigation of the early developmental status of the cardiac conduction system, in conjunction with an analysis of early cardiac rhythm, would be indispensable in genetic models of compromised NCC migration, differentiation and innervation.

Contrary to prevailing beliefs, both human and mouse embryonic heart rates have been demonstrated to react to cholinergic and adrenergic stimulation during the morphogenesis period, at mouse E12.5 and prior to human week 8 of gestation[61,62,63]. Mice deficient in catecholamines, specifically those lacking dopamine β-hydroxylase [64] or tyrosine hydroxylase[65], exhibit cardiovascular failure and begin dying as early as E11.5. Consistent with this, cardiac intrinsic catecholamine-producing cells have been localized predominantly to the dorsal venous valve and atrioventricular canal regions from E11.5[66]. This highlights the critical role of β1-adrenergic receptor signaling in maintaining fetal heart rate during morphogenesis, particularly in response to hypoxia-induced bradycardia [62]. Given that the final closure of interventricular communications occurs around E13.5 [67] and the tri-leaflet semilunar valve formation and remodeling continue beyond this stage [68], it is plausible that disruptions to cardiac contraction due to neuronal or cardiac intrinsic adrenergic inadequacy during this critical window could perturb the normal hemodynamics which is known to be an important risk factor for cardiac malformation[69].

Another possible explanation for these observations is that certain genes annotated to have a function in nerve conduction may also regulate action potential in cardiac muscles. For example, the R-type calcium channel Cacna1e is expressed in both the heart and the central nervous system. Ablation of Cacna1e has been shown to cause arrhythmia in isolated prenatal mouse hearts [70], and mutations in CACNA1E have been associated with developmental and epileptic encephalopathy in humans [71]. Similarly, reduced activity of Atp2a2 (SERCA Ca2+-ATPases that are responsible for translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen) led to a significant reduction in single action potential-driven Ca2+ signals and synaptic exocytosis [72] and impairs cardiac contractility and relaxation in mice [73]. Mutation of ATP2A2 is associated with skin disorder as well as neuropsychiatric disorders and heart failure [74]. This finding may potentially explain why neurodevelopmental disorders and arrhythmias are prevalent comorbidities affecting survivors with CHD from a genetic perspective [75, 76].

It remains uncertain as to how mutations in genes regulating cardiac contraction contribute to the risk of CHD. Mutations in genes encoding cardiac ion channels and intracellular calcium handling are associated with congenital arrhythmia syndromes [77]. Although arrhythmia is a common comorbidity in patients with structural heart defects [78], definitive monogenic causes of CHD attributable to calcium handling genes remain less well-established. We hypothesize that heterozygous loss-of-function mutations in ion channel and calcium handling genes may induce cardiac arrhythmias during certain stage of fetal heart development. However, hemodynamic disturbance alone may not be sufficient to cause morphological malformations unless it occurs in the context of mutations in other susceptibility genes, such as Notch1, Klf2, and Yap which are known to be involved in the endocardial response to flow shear stress and mediate the process of endocardial-to-mesenchymal transition [79]. For instance, transiently inducing bradycardia in mouse embryos at E9.5 through pharmacological inhibition of the rapid component of the delayed rectifier potassium current (Ikr) with dofetilide, or blocking L-type calcium channels with verapamil, does not result in heart defects on its own. However, when these pharmacological agents are combined with a heterozygous Notch1 mutation, over 50% of fetuses exhibit various heart defects due to abnormal endocardial-to-mesenchymal transition, highlighting the multifactorial nature of CHD [80]. These additional factors are yet to be found. Thus, simultaneous hemi-knockout of the candidate digenic gene sets identified in this study would be necessary to test this hypothesis in future experiments.

One limitation of this mouse screen study is the relatively small sample size. While the study managed to cover 76% of all mouse coding genes, the burden-based testing results may be biased toward larger genes, potentially leading to an underpowered statistical analysis for smaller yet functionally important genes. Based on our expectation model and an average of 59 hits per mouse, it is estimated that 10,000 independent G1 mice would be required to cover 80% of all coding genes with at least 5 hits per gene. This suggests the need for larger-scale studies to achieve a more robust statistical analysis and capture the full spectrum of genetic variations associated with CHD.

Conclusions

The mouse forward genetic screen resulted in the identification of numerous genes that contribute to an increased risk of CHD when present in the heterozygous mutant state. Notably, genes involved in regulating cardiac contraction and nervous system development and functions were significantly enriched in CHD cases. These findings align with the results obtained from a gene-based burden testing on human CHD probands, which further emphasized impaired heart contraction as a previously underappreciated risk factor for CHD. The identification of candidate digenic gene sets in this study holds promise for shedding light on the complex genetics underlying CHD and should be further investigated in future studies.