Abstract
Autism Spectrum Disorder (ASD) is genetically complex with ~100 copy number variants and genes involved. To try to establish more definitive genotype and phenotype correlations in ASD, we searched genome sequence data, and the literature, for recurrent predicted damaging sequence-level variants affecting single genes. We identified 18 individuals from 16 unrelated families carrying a heterozygous guanine duplication (c.3679dup; p.Ala1227Glyfs*69) occurring within a string of 8 guanines (genomic location [hg38]g.50,721,512dup) affecting SHANK3, a prototypical ASD gene (0.08% of ASD-affected individuals carried the predicted p.Ala1227Glyfs*69 frameshift variant). Most probands carried de novo mutations, but five individuals in three families inherited it through somatic mosaicism. We scrutinized the phenotype of p.Ala1227Glyfs*69 carriers, and while everyone (17/17) formally tested for ASD carried a diagnosis, there was the variable expression of core ASD features both within and between families. Defining such recurrent mutational mechanisms underlying an ASD outcome is important for genetic counseling and early intervention.
Similar content being viewed by others
Introduction
Autism Spectrum Disorder (ASD) is a heterogeneous condition, both in clinical presentation and in terms of the underlying etiology. Individuals with ASD are increasingly being seen in clinical genetics1,2. More than 100 genetic disorders that can exhibit features of ASD (e.g., Fragile X, Phelan-McDermid syndromes, Rett)3 and dozens of rare susceptibility genes (e.g., NLGN, NRXN, SHANK family genes), and copy number variation (CNV) loci (e.g., 1q21.1 duplication,15q11-q13 duplication, 16p11.2 deletion), have been identified, which combined can facilitate a molecular diagnosis in ~5–40% of ASD cases4,5,6,7. The likelihood of a genetic finding in ASD is dependent on the complexity of the phenotype (e.g., idiopathic or syndromic, with or without intellectual disability)8,9, the genomic technology used (e.g., microarrays, exome sequencing, genome sequencing, or combinations thereof)10, as well as the annotation pipeline and “gene lists” used for interpretation11,12.
There are examples of how understanding the genetic subtypes of ASD can assist early identification enabling earlier behavioral intervention, and informing prognosis, medical management, and assessment of familial recurrence risk13,14. Moreover, genomic data promise to facilitate pharmacologic-intervention trials through stratification based on pathway profiles15,16. To support these applications, there is a growing interest in performing robust genetic analyses, often in families and in unique populations, linked to deep phenotyping17,18,19.
The largest datasets available for genotype/phenotype correlations in ASD studies are based on CNV assessment since microarrays became the first-tier clinical diagnostic test20,21. The most relevant finding from this vast literature is that even for recurrent CNVs (i.e., genomic disorders) involved in ASD, which typically affect the same genes, there is the variable expression of phenotypes relevant to the core features in autism, and other medical features22,23,24,25.
More recently, genotype and phenotype studies of sequence-level variation (single-nucleotide variants, or SNV, and insertion/deletion, or indel events) affecting individual genes are starting to reveal clinical correlations in ASD. For example, loss-of-function variants in the SCN2A sodium channel gene impair glutamatergic neuronal excitability, leading to ASD and/or intellectual disability, while gain of function variants potentiate excitability leading to infantile-onset seizure phenotypes26. Different germline dominant-acting mutations in the phosphatase and tensin homolog (PTEN) gene found in ASD lead to an increased average head circumference in children27. Loss-of-function variants in the CHD8 chromodomain helicase DNA- binding protein eight gene are also found in overgrowth and intellectual disability forms of ASD28. Despite some progress in resolving genotype-phenotype correlations, the vast genetic complexity and variable expressivity of genes involved in ASD continue to confound most predictive studies.
Following a genotype-first approach, here we initially searched available ASD-specific, controlled access, genome-wide sequence databases, such as MSSNG (https://research.mss.ng) and Simon’s Simplex Collection (SSC) (https://www.sfari.org/resource/sfari-base) as well as our own in-house data (available in the next MSSNG data release) to identify recurrent sequence-level damaging variants (de novo loss-of function or missense variants predicted to be damaging based on the American College of Medical Genetics guidelines29) affecting the same site (genomic location) in the same gene in different families. The database searches were then followed by a literature survey to identify additional individuals reported to have the same variant. In our most compelling finding, we identified a mutational ‘hotspot’ in a string of 8-Gs in exon 21 (p.Ala1227Glyfs*69) of the SHANK3 gene that was present in 17 individuals from 15 unrelated families with ASD, as well as one individual with several autistic features and Phelan-McDermid Syndrome (but who was not tested for ASD). The individuals identified in both the ASD-specific databases and the published manuscripts had various details available describing the phenotype which we have summarized. We were able to contact the families that are described for the first time in this paper to gather additional information. Using these available data, we assessed the intra- and inter-familial phenotypic variation (as well as all other genetic information) within these individuals and discuss the findings in the context of genotype-phenotype comparison, including variable expression of ASD core symptom and related features.
Results
Identification of the recurrent p.Ala1227Glyfs*69 variant
To achieve the most comprehensive genomic representation (difficult to sequence exons, splice site boundaries) for variant detection, we initially examined the Autism Speaks MSSNG whole-genome sequencing (WGS) cohort (https://research.mss.ng/), with 11,359 samples, including 5102 affected individuals and 3567 with family data, typically belonging to trios, or quads (two parents and two affected children) for recurrent mutations. Secondly, we tested the Simon Simplex Collection (SSC) WGS collection (https://www.sfari.org/resource/simons-simplex-collection/), which comprises 9,205 samples, including 2419 affected individuals and 2393 with family data (typically two parents, one affected child, one unaffected child). Previous studies have extensively reported on MSSNG6,17,30,31 and SSC32,33. Probands from both cohorts met the criteria for ASD based on scores from standardized diagnostic criteria tools, typically the Autism Diagnostic Observation Schedule (ADOS)34 and the Autism Diagnostic Interview–Revised (ADI-R)35 and/or was supported by clinical criteria. Many individuals were also assessed with standardized measures of intelligence (I.Q.), including verbal and nonverbal ability, language, social behavior, adaptive functioning, and physical measurements6,32,33. All of this phenotype data is available from the respective databases.
From the genome sequences analyzed, our most interesting finding identified five probands in MSSNG (four males and one female) from four families and one proband in SSC (male) carrying a heterozygous guanine duplication in SHANK3 (NCBI: NM_033517.1; ENSEMBL: ENST00000262795.5; c.3679 or c.3676 depending on the transcript) (Table 1; the reference sequence NM_033517.1 was selected as the appropriate transcript for this study as this was the reference sequence used in the original publication of this variant in Durand et al.36). We also found other recurrent sequence-level de novo heterozygous damaging missense variants in the PTEN, CAMK2A, SPTAN1, MECP2, and CSNK1E genes, but in each of these instances no more than two unrelated individuals were found in the combined MSSNG and SSC data (Supplementary Material; Table S1).
The discovery of this recurrent guanine duplication variant in SHANK3 was confirmed using Sanger sequencing (Fig. 1). We then scanned the literature, including using Varicarta37 and found that this same guanine duplication was reported in 12 probands affected by ASD4,36,38,39,40,41,42, and one proband within the ASD borderline range, Phelan-McDermid syndrome, significantly delayed language, and speech and visual-motor deficits38. We carefully examined all genotypes and found that one was the same individual in the SSC cohort (14470.p1);40 therefore, we removed this duplicate individual. Considering the new cases reported here and the cases reported in the literature, the p.Ala1227Glyfs*69 variant has been observed in a total of 18 cases from 16 families, identified using different genome-testing approaches (Table 2). Nearly all of these probands (17/18) were ascertained for ASD, although the general phenotype, as discussed below, varies somewhat among individuals (Table 3; Fig. 2). We also detected one female individual with ASD (with mild intellectual disability) carrying a de novo G deletion (7-G’s) at this same site (c.3679del p.Ala1227Profs*57).
Genome annotation of the p.Ala1227Glyfs*69 variant
The SHANK3 guanine duplication is located within a segment of 8-G′s on chromosome 22q13 at genomic location [hg38]g.50,721,505dup or g.50,721,512dup, depending on the position that this variant is annotated in the guanines (Table 1; Fig. 2). Some tools annotate the first G as the duplication, and others annotate it as the final G (Supplementary Material; Fig. 3). The sequencing technology might also affect the variant annotation, with Sanger sequencing conventionally adding the G duplication at the 3′ end of the gene as the first point of amino acid change, and Next Generation Sequencing usually left aligning the variant. Independent of the position of the base insertion in the 8-Gs, the frameshift starting in exon 21 results in the new reading frame ending with a stop codon at position 69, causing a truncation lacking the C-terminal region (Fig. 3). We also confirmed that both exome sequencing and WGS reliably captured this 8-G string genomic segment in the short-read sequence (see Methods).
Segregation and population frequency of the recurrent p.Ala1227Glyfs*69 variant
All the probands identified in this study carried de novo variants with the exception of five individuals. One family with two brothers first reported in the initial SHANK3 ASD-discovery paper36 inherited the variant from their mother, who was found to be mosaic. Two siblings within the MSSNG cohort (MSSNG00342-003 and MSSNG00342-004) inherited the variant from their father, who was also shown to be a mosaic (Table 2). In this latter case, the variant was only present in 8 of 50 reads in the father’s WGS data and was verified using a T.A. clone Kit (Invitrogen cat number 45-0046). Proband 1-1047-003 also seems to have inherited the variant from his mother by somatic mosaicism, in whom the variant was present in 1 of 32 reads of the WGS data. Exome sequencing analysis was also performed in this mother, with the variant being observed in 2 of 110 reads. To search for additional potential relevant somatic mutations43, we tested the original alignment files in both cohorts using DeNovoGear’s dng-call method for the SHANK3 locus44 using 0.8 as a posterior probability of a de novo mutation (ppDNM), but we did not find any other candidates. Considering the families studied in MSSNG and SSC (our most trusted datasets) 6/7,521(0.08%) ASD-affected individuals carried the p.Ala1227Glyfs*69 variant in 5/6,681 (0.07%) of families. The Fisher’s exact test of the association between the frequency in heterozygous individuals in ASD cases and control population databases has a P value of 0.029.
Consequences of p.Ala1227Glyfs*69 on the SHANK3 protein
Nonsense mutations and frameshifts in SHANK3 can lead to reduced expression, and SHANK3-deficient neurons were found to have an altered phospho-proteome that may explain their decreased dendritic spine density45. However, SHANK3 mRNA is still expressed in truncation mutant-containing induced pluripotent stem cells (iPSCs)46 and truncated SHANK3 proteins may have a dominant-negative effects in neurons47,48. We therefore explored the consequences of p.Ala1227Glyfs*69 on the SHANK3 protein. We annotated the positions of amino acids to which the variant is mapped according to ENSEMBL and the UCSC genome browser. Using the DISOPRED3 predictor49 and the consensus of eight predictors from MobiDB-lite50, we identified where the mutation falls with respect to intrinsically disordered regions (IDRs) of the protein, which may influence protein folding and binding51. In both predictors, the position of interest was found to be embedded within a large IDR, which map to multiple isoforms (Fig. 3B). Mutations that create frameshifts and stop codons in this region of SHANK336,52 truncate two proline-rich binding sites for Homer and Cortactin (Fig. 3A) and affect function, including altering neuronal morphology in cell-based experiments46,47. The SHANK3 protein serves as a scaffold to connect membrane receptors to the actin-cytoskeleton in the postsynaptic density (PSD), a protein-rich sub-compartment considered to be a biomolecular condensate formed by phase separation53,54 due to multivalent interactions46. In each of the isoforms, these truncations are expected to impair canonical PSD formation and stability.
The variant isoforms were also analyzed using Feature Analysis of Intrinsically Disordered Regions, a tool that identifies the presence of consensus protein recognition motifs in IDRs55,56 and using PScore57, predicts phase separation propensity via IDR planar pi-contacts (Fig. 3C; Supplementary Material; Fig. S2). A number of specific short linear interaction motifs were found to be altered. Of particular interest is the increase in SH3 domain class I-binding motifs, given that SHANK3 is known to interact with numerous SH3 domains. The variants significantly increase the number of arginine-glycine and arginine-arginine dipeptide instances, which are associated with mRNA binding and phase separation, and increase the cysteine content of the sequence. A reduction in SHANK3 protein due to the frameshift (e.g., through nonsense mediated decay; discussed below) could also affect the phase separation of the PSD, which is known to be concentration dependent58.
p.Ala1227Glyfs*69 as a pathogenic variant
The p.Ala1227Glyfs*69 variant is classified in ClinVar as “Pathogenic for ASD, NDD, and others” and is exceptionally rare or absent in control populations (ClinVar; https://www.ncbi.nlm.nih.gov/clinvar/variation/208759/). In the gnomAD v2.1.1 dataset59, which uses the hg37 as reference genome, it has an allele frequency of 16/160,994 alleles = 0.000099 (0.0099%). In ALFA60, this variant is also reported in 0.02% of control Europeans samples. However, in gnomAD v3, 1000 Genomes Project (that uses hg38 as a reference genome), TOPMed61, two unpublished pediatric controls from our group (INOVA and CHILD), the Personal Genome Project Canada62 and Medical Genome Reference Bank63 this variant is not present. In combination, this suggests that the presence of the variant in gnomAD v2.1.1 and ALFA might be due to low-quality sequencing with the preliminary description being corrected in gnomAD v3. It is also noteworthy that ~1/100 people will have ASD, so it would be expected to find p.Ala1227Glyfs*69 variant carriers in control populations. Based on our findings described here they would likely have ASD, but additional studies will be required to further assess this.
We have analyzed the genomic conservation of this variant with GERP64, UCSC PhyloP, and phastCons for primates, placental mammals, and 100 vertebrates65. GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA but did not occur because the element has been under functional constraint. The p.Ala1227Glyfs*69 variant has a GERP score of 5.2 (p = 0), suggestive of having a large deleterious effect66. The PhyloP score was 0.6 for primates, 1.35 for mammals, and 2.13 considering 100 vertebrates, suggesting high evolutionary conservation. The PhastCon scores were also higher than 0.98 for primates, mammals, and vertebrates, which indicates a strong negative selection on this variant.
Genotype and phenotype correlation
In all 17 p.Ala1227Glyfs*69 carriers evaluated for ASD, ASD was confirmed by review of the ASD gold standard diagnostic tests available in the databases or as reported in the original manuscripts, and the majority of participants described are reported to have an intellectual disability defined as an IQ score below 70 and impairments in adaptive functioning, although the spectrum of severity is wide (Table 3; Fig. 2). Four individuals were ascertained for Phelan-McDermid Syndrome, with three of these being of the 17 receiving a formal ASD diagnosis and one never being assessed for autism. Language deficits are also prevalent and often severe. We were cautious about making claims on other associated conditions as they have not been universally and systematically ascertained. However, hypotonia and gait abnormalities are common, also consistent with animal model data67. Seizures were reported in 3/18 participants. Other neurodevelopmental concerns include ADHD, anxiety, Developmental Coordination Disorder, and mood disorders. Gastrointestinal distress and sleep dysfunction were also reported. Last, both dysmorphia and other organ anomalies were reported (conductive hearing loss- and coronary artery fistula). Within pairs of siblings sharing a variant, there is a similarity of phenotype, with some variability in the severity of the intellectual disability.
Different de novo mutations in SHANK3 have also been associated with other developmental/neuropsychiatric disorders and genetic syndromes such as schizophrenia47,68 and Phelan-McDermid Syndrome (PMS)69. The majority of children diagnosed with PMS also have ASD, and both conditions are often associated with intellectual and language delay, hypotonia, seizures, and sleep disorders, although children with PMS also often have other organ involvement. We also examined the whole genomes from the MSSNG and SSC p.Ala1227Glyfs*69 carriers and assessed for other clinically relevant variants that could be contributing to the varying phenotypic presentation, but none were identified. Additionally, no other clinically relevant variants were highlighted in those individuals described in the literature36,38,69,70,71.
To evaluate if common genetic variants may be contributing to the ASD phenotype in the p.Ala1227Glyfs*69 SHANK3 variant carriers, we calculated their ASD polygenic risk score (PRS) for all accessible individuals from European ancestry in MSSNG (db6) and SSC. PRS in the probands analyzed in this study varied between −1.167 and 15.606 (Table 2), showing no clear pattern between the presence of the clinically significant SHANK3 variant and the polygenetic risk of common variants. PRS in all subjects with autism in MSSNG and SSC ranges between −18.580 and 20.626.
Discussion
Our data indicate that 17/17 carriers (from 15 independent families) of the p.Ala1227Glyfs*69 variant affecting SHANK3 who have been formally tested carry a diagnosis of ASD. Our analysis did not identify any other obvious rare or common genetic variants, or combinations thereof, in the genomes of these individuals that could be contributing to the phenotypes reported in these individuals. Given the nature of neurobehavioral complexity, perhaps not surprisingly, there is phenotypic heterogeneity exhibited amongst p.Ala1227Glyfs*69 carriers, which is a hallmark of autism72,73, as well as other related brain disorders that may share overlapping clinical features and contributory susceptibility genes74,75. It is instructive for future “genotype-first” queries that the discovery of this recurrent p.Ala1227Glyfs*69 variant was missed in our early analyses. It was only detected here upon careful consideration of the different naming schemes of the various isoforms (and exons within them) in SHANK3, which also varied between different software tools, as well as the various genome builds being compared against (Table 1)76,77.
In addition, we searched for p.Ala1227Glyfs*69 SHANK3 variants in unpublished data from the SPARK cohort41. From 8744 ASD-affected individuals for which sequencing data from both parents were available, the variant was detected in two male individuals, both de novo. The variant was also detected in three out of 13,156 ASD-affected individuals (two males and one female) for which parental sequences were not available and thus inheritance could not be determined. As well from a private database we identified a female teen with ASD which based on the Vineland she would be described as severe, severe language delay, and severe global developmental delay. As highlighted on continuous measures of emotional difficulties (CBCL), she also presents with attention difficulties. This individual was not included in Table 3 since gold standard ASD measures were not available and this phenotype description is based on available assessments. We mention this data just to demonstrate that the variant is found in other collections, as would be expected, and await the presentation of more detailed phenotype data from these participants.
Two independently-created murine models with an insertion of a guanine nucleotide into the analogous mouse base pair position, which we refer to here as Shank3 InsG3680, have also demonstrated changes in cellular, circuit, and behavioral phenotypes67,78 (Supplementary Material; Table S2). Specifically, these Shank3InsG3680 mouse models demonstrated changes to baseline neurotransmission and/or impairments in long-term depression (LTD) and long-term potentiation (LTP), the synaptic basis of learning and memory. Overall homozygous Shank3InsG3680 +/+ mice exhibited more significant changes than heterozygous Shank3InsG3680 mice, suggesting that functioning of one normal Shank3 copy maybe sufficient to support some of its function.
Regional differences in synaptic deficits and synaptic composition were observed, and the extent of the impact may have been modulated by other Shank family genes. In the adult hippocampus, expression of the reversible Shank3InsG3680 variant cassette67 produced a truncated Shank3 protein and loss of the major high molecular weight isoforms at the synapse. This was associated with impaired hippocampal mGluR dependent LTD, intact LTP, and changes to baseline NMDA receptor (NMDAR) mediated synaptic function. In the striatum, Zhou et al.78 showed a significant decrease of levels of Shank3 mRNA in the Shank3InsG3680 strain compared with the wild type, suggesting a reduced level of mRNA through nonsense-mediated decay. This finding suggests that the InsG3680 variant results in a near-complete loss of SHANK3 protein, concomitant with synaptic transmission deficits in juvenile and adult homozygous mutant Shank3InsG3680 (+/+) mice. Post-translational modifications, modulated by nitric oxide, were also found in both young and adult Shank3InsG3680 +/+mice.
In assessments of general cognitive function, Shank3InsG3680 +/+ mice showed mild spatial learning impairments in the Morris Water Maze task and motor learning deficits in the accelerating rotarod task, while heterozygous mice did not67. ASD-associated behaviors in these two models also showed mixed outcomes in both social interaction impairments and repetitive behaviors that, similar to human assessments, may be dependent on age and gender. Speed et al.67 reported statistically different effects in some of their assessments comparing between male and female adult mice. This group did not observe social interaction deficits in the three-chamber task with mixed-sex adult mutant mice, nor did they observe repetitive behaviors, but instead suggested aversion to novel objects. However, in large all-male cohorts, Zhou et al.78 showed deficits in social behaviors in both juvenile and adult mice. In addition, in adults there was increased anxiety, repetitive grooming behaviors, and sensory processing differences78. On balance, the mouse data seems to generally recapitulate the learning impairments and behavioral differences seen in patients with the p.Ala1227Glyfs*69 SHANK3 variant.
Highly penetrant alleles such as p.Ala1227Glyfs*69 in neurodevelopmental disorders are under severe negative selection and are constantly being removed from the population79,80. However, recurrent mutations are always being added to the gene pool and while typically occurring randomly, the intrinsic81 and extrinsic characteristics82 may also have an influence83. Experimental investigations have shown that guanine bases can be targets for oxidative damage in DNA, while mutability in other bases is more variable84. Moreover, the locus under study is within 8 guanines, which constitutes a homopolymer run (HR). HRs are sequences with six or more identical nucleotides and are associated with >10-fold enrichment of mutation compared to the genomic average85. It is noteworthy that there are three other G homopolymer runs in SHANK3, but no recurrent variants were found at these sites.
The CpG content of DNA has also been shown to influence the mutation rate in non-CpG-containing sequences, suggesting that intrinsic properties of DNA sequences may be more important than the chromosomal environment in determining mutation rates and genome integrity. Evidence indicates that because of the propensity for methyl-CpGs to deaminate and produce mismatches, it is plausible that error-prone repair mechanisms may have a role in hypermutability. CpG methylation might also have epigenetic effects by promoting chromatin states that make DNA more susceptible to mutations86.
Although exceedingly rare (0.075% frequency in the ASD families studied by WGS), the finding that this p.Ala1227Glyfs*69 variant in SHANK3 is, so far, concordant with an ASD, and that it will surely continue to sporadically re-occur in the population, has important implications for genetic counseling. It will also be important to continue to search for the p.Ala1227Glyfs*69 variant in SHANK3 to see if it confers risk in other disorders, including perhaps under a multiple-variant model87. Defining a specific mutational mechanism underlying an ASD outcome, may also focus strategies for the development of therapeutic interventions.
Methods
Genome sequence analysis
We searched ASD-specific genomic databases in which the participants upon recruitment had a diagnosis of ASD, for damaging de novo sequence-level variants affecting exactly the same genomic location in different families. A variant was defined to be damaging if it caused loss-of function (stop gain, frameshift, or canonical splice site-disrupting) or was a predicted deleterious missense variant based on American College of Medical Genetics guidelines29. Initially, we examined rare (frequency less than 0.001 in gnomAD and 1000 g) de novo variants identified from MSSNG data release DB6 (release date June 24, 2020), which were detected as previously described6. After identifying this recurrent variant in SHANK3, we then searched our in-house databases and performed literature searches for the same variant. Ethical review of these cohort studies was approved by institutional review boards and included assessing datasets through applications to Data Access Committees.
Phenotyping measures
Phenotypic data was extracted either from the original manuscripts, in which case we attempted to stay close to the original descriptions or from the reference databases. In the latter case, clinical diagnosis of autism spectrum disorder was reported in the databases and was supported by ADI/ADOS. Intellectual disability was reported as a clinical diagnosis and in most cases formal IQ testing was available for confirmation. Language delay was available as a clinical diagnosis, often with characterizations, such as “minimally verbal” or “nonverbal” and in many cases formal language measure scores were available for review. Information on psychiatric/ neurological comorbidities was extracted from the original manuscripts, or available as a clinician diagnosis or clinical concern based on continuous measures of such symptomatology available (e.g., CBCL, RCADS).
Confirming representation of exon 21 in exome and WGS datasets
Given the high GC-density content of SHANK3, which can influence exon capture and sequencing52, we thought it was critical when assessing mutational frequency to confirm that there were no biases in read-coverage of the site of the target variant within exon 21 (Supplementary Material; Fig.1). Using whole-exome sequences from 298 patients and 462 controls from our internal dataset, we ran the Agilent Sureselect Clinical research exome V1 for exome sequence analysis and show that the coverage around the G duplication region is at the anticipated 120x coverage (Supplementary Material; Fig. 1). This analysis also indicates that diagnostic exome sequencing will more than adequately capture and accurately genotype this position. WGS analysis of probands from MSSNG and SSC also confirm that exon 21 in SHANK3 is uniformly covered.
Protein and evolutionary conservation analysis
We used the DISOPRED3 predictor49 and the consensus of eight predictors from MobiDB-lite50 to map where the p.Ala1227Glyfs*69 variant falls with respect to intrinsically disordered regions (IDRs) of the protein. The variant isoforms were also analyzed using Feature Analysis of Intrinsically Disordered Regions55,56 and using PScore57. We analyzed the genomic conservation of the p.Ala1227Glyfs*69 variant with GERP64, UCSC PhyloP, and phastCons for primates, placental mammals, and 100 vertebrates65. The main text, tables, and figures (including Supplemental) have additional details relevant to the presentation of the results.
Polygenic risk score analysis (PRS)
PRS was calculated for all individuals from European ancestry in MSSNG (db6) and SSC merged with 1000 Genomes European population using GWAS summary statistics derived from the iPSYCH Autism project including 13,076 cases and 22,664 controls from Denmark88. This included probands MSSNG00342-003, MSSNG0342-004, 1-1047-003, 2-1774-003, and 14470.p1. A total of 25,837 SNPs were included in PRS calculation. Since the proband 7-0527-003 was part of a later version of the MSSNG cohort (db7), he was not included in the initial PRS calculation. This individual’s PRS was calculated separately with his parents (7-0527-001 and 7-0527-002) using the same 25,837 SNPs included in PRS calculations for the others and centered by the mean in whole MSSNG/SSC/1000 Genomes European population. However, of 25,837 SNPs, 1496 were missing due to sample quality in this family, and caution is needed in comparison with the other subjects. The approach for interpretation of the PRS data was based on the previous studies18,88,89.
Study recruitment
This study has complied with all relevant ethical regulations including obtaining informed consent from all participants and was approved by the Research Ethics Board at The Hospital for Sick Children.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Access to the whole-genome sequence and phenotype information from MSSNG and SSC data can be obtained by completing data access agreements (https://research.mss.ng and https://www.sfari.org/resource/sfari-base, respectively), as was done for this study. These two well-established and stable whole-genome sequence and phenotype resources are utilized by approved investigators worldwide. The 1000 G genome-sequencing data are publicly available via Amazon Web Services (https://docs.opendata.aws/1000genomes/readme.html). Access to data through other publications or resources is described in the main text and is outlined in Table 2. Whole-genome sequence for 7-0572-003 will be available in the MSSNG database in its next release but can be requested in advance by contacting the corresponding author. The relevant variant information from the exome or direct Sanger sequencing data for the individuals for which whole-genome sequencing data does not exist and is described for the first time in this paper (HNDS_0130-01; 1505221080) is found in Table 2. Additional data can also be requested by contacting the corresponding author.
References
Tammimies, K. et al. Molecular diagnostic yield of chromosomal microarray analysis and whole-exome sequencing in children with autism spectrum disorder. JAMA - J. Am. Med. Assoc. 314, 595–903 (2015).
Fernandez, B. A. & Scherer, S. W. Syndromic autism spectrum disorders: moving from a clinically defined to a molecularly defined approach. Syndromic autism spectrum disorders - Fernandez and Scherer Dialogues in. Clin. Neurosci. 19, 353–372 (2019).
Betancur, C. Etiological heterogeneity in autism spectrum disorders: more than 100 genetic and genomic disorders and still counting. Brain Res. 1380, 42–77 (2011).
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e523 (2020).
Bourgeron, T. From the genetic architecture to synaptic plasticity in autism spectrum disorder. Nat. Rev. Neurosci. 16, 551–563 (2015).
Yuen, R. K. et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017).
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
Woodbury-Smith, M. & Scherer, S. W. Progress in the genetics of autism spectrum disorder. Developmental Med. Child Neurol. 60, 445–451 (2018).
Vorstman, J. A. S. et al. Autism genetics: opportunities and challenges for clinical translation. Nat. Rev. Genet. 18, 362–376 (2017).
Srivastava, S. et al. Meta-analysis and multidisciplinary consensus statement: exome sequencing is a first-tier clinical diagnostic test for individuals with neurodevelopmental disorders. Genet. Med. 21, 2413–2421 (2019).
Schaaf, C. P. et al. A framework for an evidence-based gene list relevant to autism spectrum disorder. Nat. Rev. Genet. 21, 367–376 (2020).
Hoang, N., Buchanan, J. A. & Scherer, S. W. Heterogeneity in clinical sequencing tests marketed for autism spectrum disorders. npj Genomic. Medicine 3, 1–4 (2018).
Yehia, L. et al. Copy number variation and clinical outcomes in patients with germline PTEN mutations. JAMA Netw. Open 3, e1920415 (2020).
Scherer, S. W. & Dawson, G. Risk factors for autism: translating genomic discoveries into diagnostics. Hum. Genet. 130, 123–148 (2011).
Anagnostou, E. Clinical trials in autism spectrum disorder: evidence, challenges and future directions. Curr. Opin. Neurol. 31, 119–125 (2018).
Sahin, M. & Sur, M. Genes, circuits, and precision therapies for autism and related neurodevelopmental disorders. Science 350, 1–19 (2015).
Yuen, R. K. et al. Whole-genome sequencing of quartet families with autism spectrum disorder. Nat. Med. 21, 185–191 (2015).
Leblond, C. S. et al. Both rare and common genetic variants contribute to autism in the Faroe Islands. npj Genom. Med. 4, 1 (2019).
Simons Vip, C. Simons Variation in Individuals Project (Simons VIP): a genetics-first approach to studying autism spectrum and related neurodevelopmental disorders. Neuron 73, 1063–1067 (2012).
Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).
Riggs, E. R. et al. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). Genet Med 22, 245–257 (2020).
Pinto, D. et al. Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. Am. J. Hum. Genet. 94, 677–694 (2014).
Merikangas, A. K. et al. The phenotypic manifestations of rare genic CNVs in autism spectrum disorder. Mol. Psychiatry 20, 1366–1372 (2015).
Malhotra, D. & Sebat, J. CNVs: harbingers of a rare variant revolution in psychiatric genetics. Cell 148, 1223–1241 (2012).
Marshall, C. R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008).
Sanders, S. J. et al. Progress in understanding and treating SCN2A-mediated disorders. Trends Neurosci. 41, 442–456 (2018).
Frazier, T. W. Autism spectrum disorder associated with germline heterozygous PTEN mutations. Cold Spring Harb. Perspect. Med. 9, a037002 (2019).
Bernier, R. et al. Disruptive CHD8 mutations define a subtype of autism early in development. Cell 158, 263–276 (2014).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405–424 (2015).
Jiang, Y. H. et al. Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am. J. Hum. Genet. 93, 249–263 (2013).
Trost, B. et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature 586, 80–86 (2020).
Fischbach, G. D. & Lord, C. The simons simplex collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).
Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).
Lord, C., Cook, E. H., Leventhal, B. L. & Amaral, D. G. Autism spectrum disorders. Neuron 28, 355–363 (2000).
Rutter, M., LeCouteur, A. & Lord, C. (ADI™-R) Autism Diagnostic Interview–Revised. (WPS, 2003).
Durand, C. M. et al. Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nat. Genet. 39, 25–27 (2007).
Belmadani, M. et al. VariCarta: A Comprehensive Database of Harmonized Genomic Variants Found in Autism Spectrum Disorder Sequencing Studies. Autism Res. 12, 1728–1736 (2019).
De Rubeis, S. et al. Delineation of the genetic and clinical spectrum of Phelan-McDermid syndrome caused by SHANK3 point mutations. Mol. Autism 9, 1–20 (2018).
Zhou, W. Z. et al. Targeted resequencing of 358 candidate genes for autism spectrum disorder in a Chinese cohort reveals diagnostic potential and genotype–phenotype correlations. Hum. Mutat. 40, 801–815 (2019).
O’Roak, B. J. et al. Recurrent de novo mutations implicate novel genes underlying simplex autism risk. Nat. Commun. 5, 1–6 (2014).
Feliciano, P. et al. Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes. npj Genom. Med. 4, 19 (2019).
Farwell, K. D. et al. Enhanced utility of family-centered diagnostic exome sequencing with inheritance model-based analysis: Results from 500 unselected families with undiagnosed genetic conditions. Genet. Med. 17, 578–586 (2015).
Lim, E. T. et al. Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder. Nat. Neurosci. 20, 1217–1224 (2017).
Ramu, A. et al. DeNovoGear: De novo indel and point mutation discovery and phasing. Nat. Methods 10, 985–987 (2013).
Bidinosti, M. et al. CLK2 inhibition ameliorates autistic features associated with SHANK3 deficiency. Scien 20, 7–12 (2012).
Gouder, L. et al. Altered spinogenesis in iPSC-derived cortical neurons from patients with autism carrying de novo SHANK3 mutations. Sci. Rep. 9, 94 (2019).
Gauthier, J. et al. De novo mutations in the gene encoding the synaptic scaffolding protein SHANK3 in patients ascertained for schizophrenia. Proc. Natl Acad. Sci. USA 107, 7863–7868 (2010).
Durand, C. M. et al. SHANK3 mutations identified in autism lead to modification of dendritic spine morphology via an actin-dependent mechanism. Mol. Psychiatry 17, 71–84 (2012).
Jones, D. T. & Cozzetto, D. DISOPRED3: Precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31, 857–863 (2015).
Necci, M., Piovesan, D., Dosztanyi, Z. & Tosatto, S. C. E. MobiDB-lite: Fast and highly specific consensus prediction of intrinsic disorder in proteins. Bioinformatics 33, 1402–1404 (2017).
Csizmok, V., Follis, A. V., Kriwacki, R. W. & Kay, J. D. F.- Dynamic protein interaction networks and new structural paradigms in signaling. Physiol. Behav. 176, 139–148 (2017).
Moessner, R. et al. Contribution of SHANK3 mutations to autism spectrum disorder. Am. J. Hum. Genet. 81, 1289–1297 (2007).
Zeng, M. et al. Phase Transition in Postsynaptic Densities Underlies Formation of Synaptic Complexes and Synaptic Plasticity. Cell 166, 1163–1175.e1112 (2016).
Chen, X., Wu, X., Wu, H. & Zhang, M. Phase separation at the synapse. Nat. Neurosci. 23, 301–310 (2020).
Zarin, T. et al. Proteome-wide signatures of function in highly diverged intrinsically disordered regions. eLife 8, 1–26 (2019).
Zarin, T. et al. Identifying molecular features that are associated with biological function of intrinsically disordered protein regions. bioRxiv, 1–23, (2020).
Vernon, R. M. C. et al. Pi-Pi contacts are an overlooked protein feature relevant to phase separation. eLife 7, 1–48 (2018).
Tsang, B., Pritišanac, I., Scherer, S. W., Moses, A. M. & Forman-Kay, J. D. Phase Separation as a Missing Mechanism for Interpretation of Disease Mutations. Cell 183, 1742–1756 (2020).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Phan, L., Jin, Y. & Zhang, Z. ALFA: Allele Frequency Aggregator. National Center for Biotechnology Information, U.S. National Library of Medicine (2020).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Reuter, M. S. et al. The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants. CMAJ 190, E126–E136 (2018).
Pinese, M. et al. The Medical Genome Reference Bank contains whole genome and phenotype data of 2570 healthy elderly. Nat. Commun. 11, 435 (2020).
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP. PLoS Comput. Biol. 6, 1001025 (2010).
Kuhn, R. M., Haussler, D. & James Kent, W. The UCSC genome browser and associated tools. Brief. Bioinforma. 14, 144–161 (2013).
Henn, B. M. et al. Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proc. Natl Acad. Sci. USA 113, E440–E449 (2016).
Speed, H. E. et al. Autism-associated insertion mutation (InsG) of shank3 exon 21 causes impaired synaptic transmission and behavioral deficits. J. Neurosci. 35, 9648–9665 (2015).
De Sena Cortabitarte, A. et al. Investigation of SHANK3 in schizophrenia. Am. J. Med. Genet., Part B: Neuropsychiatr. Genet. 174, 390–398 (2017).
Leblond, C. S. et al. Meta-analysis of SHANK mutations in autism spectrum disorders: a gradient of severity in cognitive impairments. PLoS Genet. 10, e1004580 (2014).
Bonaglia, M. C. et al. Disruption of the ProSAP2 gene in a t(12;22)(q24.1;q13.3) is associated with the 22q13.3 deletion syndrome. Am. J. Hum. Genet. 69, 261–268 (2001).
Du, X. et al. Genetic diagnostic evaluation of trio-based whole exome sequencing among children with Diagnosed or suspected autism spectrum disorder. Front. Genet. 9, 1–8 (2018).
Pelphrey, K. A., Shultz, S., Hudac, C. M., Vander Wyk, B. C. & Manuscript, A. Development in autism spectrum disorder. J. Child Psychol. Psychiatry 52, 631–644 (2012).
Castelbaum, L., Sylvester, C. M., Zhang, Y., Yu, Q. & Constantino, J. N. On the nature of monozygotic twin concordance and discordance for autistic trait severity: a quantitative analysis. Behav. Genet. 50, 263–272 (2020).
Myers, S. M. et al. Insufficient Evidence for “Autism-Specific” Genes. Am. J. Hum. Genet. 106, 587–595 (2020).
State, M. W. & Levitt, P. The conundrums of understanding genetic risks for autism spectrum disorders. Nat. Neurosci. 14, 1499–1506 (2011).
Bruford, E. A. et al. Guidelines for human gene nomenclature. Nat. Genet 52, 754–758 (2020).
Stenson, P. D. et al. The Human Gene Mutation Database (HGMD((R))): optimizing its use in a clinical diagnostic or research setting. Hum. Genet 139, 1197–1207 (2020).
Zhou, Y. et al. Mice with Shank3 Mutations Associated with ASD and Schizophrenia Display Both Shared and Distinct Defects. Neuron 89, 147–162 (2016).
Uher, R. The role of genetic variation in the causation of mental illness: an evolution-informed framework. Mol. Psychiatry 14, 1072–1082 (2009).
Yuen, R. K. et al. Genome-wide characteristics of de novo mutations in autism. NPJ Genom. Med 1, 160271–1602710 (2016).
Ellegren, H., Smith, N. G. C. & Webster, M. T. Mutation rate variation in the mammalian genome. Curr. Opin. Genet. Dev. 13, 562–568 (2003).
Crow, J. F. The origins, patterns and implications of human spontaneous mutation. Nat. Rev. Genet. 1, 40–47 (2000).
Michaelson, J. J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).
Růžička, M. et al. DNA mutation motifs in the genes associated with inherited diseases. PLoS ONE 12, 1–16 (2017).
Montgomery, S. B. et al. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res 23, 749–761 (2013).
Swami, M. Mutation: It’s the CpG content that counts. Nat. Rev. Genet. 11, 103283 (2010).
Leblond, C. S. et al. Genetic and functional analyses of SHANK2 mutations suggest a multiple hit model of autism spectrum disorders. PLoS Genet. 8, 1002521 (2012).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder HHS Public Access Author manuscript. Nat. Genet. 51, 431–444 (2019).
D’Abate, L. et al. Predictive impact of rare genomic copy number variations in siblings of individuals with autism spectrum disorders. Nat. Commun. 10, 5519 (2019).
Acknowledgements
We thank the families for their participation over the years. We also thank the Participant Advisory Committee of the Province of Ontario Neurodevelopmental Network (https://pond-network.ca/patient-advisory-committee/) for their regular input and perspectives ensuring the initiatives and outcomes of our research are participant-driven. We also thank The Centre for Applied Genomics and Verily Life Sciences for their analytical and technical support, as well as staff at Autism Speaks for organizational and fundraising support. We thank Jonathon Ditlev for insightful discussion. This work was funded by Autism Speaks, Autism Speaks Canada, the University of Toronto McLaughlin Centre, the Canada Foundation for Innovation, the Canadian Institutes of Health Research (CIHR), Genome Canada/Ontario Genomics Institute, the Government of Ontario, Brain Canada, Ontario Brain Institute Province of Ontario Neurodevelopmental Disorders (POND), and The Hospital for Sick Children Foundation. L.O.L holds Lap-Chee Tsui Postdoctoral Fellowship from The Hospital for Sick Children. S.W.S holds the Northbridge Chair in Paediatric Research at the Hospital for Sick Children and University of Toronto.
Author information
Authors and Affiliations
Contributions
L.O.L., J.L.H., and S.W.S. conceived and designed the experiments. M.S.R., D.R., B.T., M.Z., O.R., L.Y.S.L., C.R.M., E.D.B., and R.D. analyzed the genome sequence data. L.O.L., I.V., A.M., and J.D.F.-K. performed protein and evolutionary conservation analysis. A.I., K.C., S.S., B.G., T.F., J.V., S.S., S.M.E.L., P.S., A.-C.T., M.W., S.L., J.L., T.B., and E.A. diagnosed, examined, and recruited participants as well as completed genotype-phenotype correlations. L.O.L., J.L.H., M.S.R., D.R., I.P., A.M., J.D.F.-K., B.T., M.Z., C.R.M., D.H., C.A..B., E.A., and S.W.S. helped perform different components of analyses and data interpretations. L.O.L., J.L.H., E.A. and S.W.S. wrote the manuscript. L.O.L. and J.L.H. contributed equally to the manuscript.
Corresponding author
Ethics declarations
Competing interests
S.W.S. is on the Scientific Advisory Committees of Deep Genomics, Population Bio and an Academic Consultant for the King Abdulaziz University. The remaining authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Loureiro, L.O., Howe, J.L., Reuter, M.S. et al. A recurrent SHANK3 frameshift variant in Autism Spectrum Disorder. npj Genom. Med. 6, 91 (2021). https://doi.org/10.1038/s41525-021-00254-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41525-021-00254-0
- Springer Nature Limited