Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism

Nagao, Yoshiro

doi:10.1038/srep17156

Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism

Article
Open access
Published: 24 November 2015

Volume 5, article number 17156, (2015)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism

Download PDF

Yoshiro Nagao^1,2,3

Abstract

“Missing heritability” in genome wide association studies, the failure to account for a considerable fraction of heritability by the variants detected, is a current puzzle in human genetics. For solving this puzzle the involvement of genetic variants like rare single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) has been proposed. Many papers have published estimating the heritability of sets of polymorphisms, however, there has been no paper discussing the estimation of a heritability of a single polymorphism. Here I show a simple but rational method to calculate heritability of an individual polymorphism, h_p². Using this method, I carried out a trial calculation of h_p² of CNVs and SNPs using published data. It turned out that h_p² of some CNVs is quite large. Noteworthy examples were that about 25% of the heritability of type 2 diabetes mellitus and about 15% of the heritability of schizophrenia could be accounted for by one CNV and by four CNVs, respectively. The results suggest that a large part of missing heritability could be accounted for by re-evaluating the CNVs which have been already found and by searching novel CNVs with large h_p².

Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data

Article 07 March 2022

A method to estimate the contribution of rare coding variants to complex trait heritability

Article Open access 09 February 2024

Genetic Basis of Complex Genetic Disease: The Contribution of Disease Heterogeneity to Missing Heritability

Article 30 September 2014

Introduction

Genome-wide association studies (GWAS) have identified hundreds of gene polymorphisms associated with common diseases, however, every effort to explain the heritability of a disease by single nucleotide polymorphisms (SNPs) detected in GWAS has been failed^1,2,3. Wellcome Trust Case Control Consortium et al. reported a genome-wide association study of copy number variations (CNVs) for eight common diseases in 2010 and they concluded that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases⁴. Because efforts have largely focused on common genetic variants, one hypothesis is raised that much of the missing heritability is due to rare genetic variants^2,5. However, it has not yet reported that a large part of the heritability of a disease is accounted for by rare variants. Although many papers have reported the contribution of a set of variants to heritability by the quantitative genetic analysis, there has been no paper discussing about the estimation of a heritability of a single polymorphism. Here I describe a novel method to calculate heritability of an individual polymorphism including a SNP or a CNV.

Results

Definitions and premises

The frequency of a risk allele in a general population: p.
The frequency of non-risk allele in a general population: q.
The frequency of a risk allele in patients: u.
The frequency of non-risk allele in patients: v.
The prevalence of a disease: P. Suppose frequencies of the risk and non-risk alleles of asymptomatic individuals are represented by x and y, respectively, then the following relationships are generated:

Odds ratio, OR, is represented by the following:

In the reports of case-control study, u, x and OR are usually shown and p can be calculated by using Equation [1]. When the data of p and OR are available in a SNP database, u or v should be calculated. It is impossible to have reasonable solutions of u and v using Equations [1, 2, 3]. Instead, they can be estimated by approximated solutions. First of all, calculation of genotype frequencies of the first-degree relatives is necessary for the estimation of heritability. For this purpose, Bayes’ method will be needed, because frequency of the risk genotype(s) of them should be calculated with a posterior probability. For these purposes the following definitions are needed.

A and a represent dominant and recessive allele, respectively.
The genotype frequency of AA for the proband: α.
The genotype frequency of Aa for the proband : β.
The genotype frequency of aa for the proband: γ.
The frequency of the risk genotype(s) of the general population: X₁.
The frequency of the risk genotype(s) of the first-degree relatives: Y₁.

The probability of each genotype for a sibling and an offspring is shown in Table 1. The probability of each genotype for a parent, that is same as for an offspring, is omitted here. The calculation procedure to have genotype probabilities were shown in the section of the methods.

Table 1 Probability of each genotype of a sibling and an offspring.

Full size table

Then the calculations of the heritability of a polymorphism of the main subject are shown.

Heritability of a polymorphism under an autosomal dominant (AD) model

When genotypes AA and Aa have a same risk effect, Y₁ of a sibling is calculated using the expressions in Table 1 as follows:

Y₁ of an offspring is calculated as follows:

A relation between the arithmetic mean and the geometrical average indicates that there is a relation of Y_O1 > Y_S1 unless v equals to q.

Let us think about the incidence rate of the disease among the first-degree relatives, Q. When a polymorphism is involved in a part of the patients group, its share in the prevalence, P, is represented by the population attributable risk that is denoted by P(1–v/q) (Fig. 1A). Suppose that the risk allele of a polymorphism is the only genetic cause of a disease. For the first-degree relatives of the patients who do not have the risk allele the incident rate is not different from that in the general population. Therefore Q will be bigger than P by (Y₁/X₁ − 1) for the effect of this polymorphism (Fig. 1B). Then the incidence rate of the disease for a sibling, Q_s, is represented by Equation [6], as follows:

The incidence rate for an offspring, Q_o, is represented by Equation [7], as follows:

Once Q_s or Q_o is estimated, the heritability of a polymorphism, h_p², is calculated by the Falconer’s liability threshold model⁶.

Heritability of a polymorphism under an autosomal recessive (AR) model

It is known that some polymorphisms show a recessive effect. If the risk allele of a polymorphism shows a recessive effect, frequencies of the risk genotypes of a sibling and an offspring, Y_S1 and Y_O1, are represented as follows, respectively:

In the recessive model, homozygote is the risk genotype. Therefore the proportion of patients who have the risk genotype in the holder of risk allele is represented by u²/(u² + 2uv). The incidence rates of the disease among siblings and among offspring, if we consider only for the effect of the polymorphism are represented by next Equations, respectively, as follows:

Heritability of a polymorphism under other inheritance models

h_p² can be estimated for a polymorphism under any other inheritance models so far the frequency of the risk genotype(s) for the first-degree relatives can be calculated. If a polymorphism is located on an autosome and if the OR of heterozygote is smaller than that of homozygote, the h_p² of this polymorphism is smaller than h_p² under AD model and larger than h_p² under AR model.

Calculation of the heritability of two or more polymorphisms

Falconer’s method is based on the calculation of the “liability thresholds” for the prevalence of a disease in general population and for the recurrence rate in the first-degree relatives. Units of these measures are standard deviations and heritability is estimated by the difference of two measures⁶. The calculation of the heritability of two or more polymorphisms is possible. For this purpose second clause of Equation [6] or [7] for each polymorphism should be calculated and added finally to P.

Estimation of various CNVs and SNPs reported in the literatures

Most germline CNVs are heritable⁷. However, heredity form of a CNV is not always known. Furthermore a de novo CNV is sometimes identified in the association studies (3). The heritability of a disease has been often estimated by twin studies. Monozygotic (MZ) twins share all germline polymorphisms including de novo variants, whereas dizygotic (DZ) twins usually do not share a de novo polymorphism. Because heritability is calculated by a difference between the concordance rates of MZ twins and DZ twins, a de novo polymorphism should also be involved in the estimation of heritability in a twin study. When we estimate the contribution of a CNV to the heritability of a disease by Falconer’s model, the recurrence risk to hold the CNV for a sibling cannot be used theoretically because it may be a de novo CNV for the proband. On the other hand, the recurrence risk for an offspring can be used because all germline polymorphisms, including de novo ones, will be fundamentally transmitted to the offspring.

Table 2 listed various CNVs and SNPs reported in the literatures. The h_p² of these polymorphisms were calculated for offspring under the AD model. As shown in Table 2, CNVs generally have a larger h_p² (>0.01). A noteworthy result was that about 25% of the heritability of type 2 diabetes mellitus (T2DM) could be accounted for by one CNV, a value greater than the previously estimated heritability explained by all identified variants in GWAS published in 2012⁸. Another noteworthy result was that about 15% of the heritability of schizophrenia could be accounted for by four CNVs, although this value was smaller than the previously estimated heritability (23%) explained by all identified variants in GWAS published in 2012⁹. With regard to schizophrenia, it turned out that the h_p² of a CNV that was detected only in patients (OR = +∞) is large. The results in the analyses suggest that a large part of missing heritability of common diseases could be accounted for by a kind of CNVs. 15q13.3 microdeletions has been reported to be associated not only with schizophrenia but also with idiopathic generalized epilepsy (IGE)^2,10. Although the accurate data of prevalence of IGE that contains several types of epilepsies could not be obtained, h_p² of IGE was estimated to be 0.13–0.15 (not shown in Table 2). CNVs have been suspected to be involved in the pathophysiology of neuropsychiatric conditions¹¹ The results of trial estimation of the h_p² of a polymorphism suggest that CNVs might be the major genetic cause of neuropsychiatric disorders.

Table 2 Results of a trial to calculate h_p² of CNVs and SNPs using published data.

Full size table

Comparison of the required number of polymorphisms to explain a heritability

Previous studies have estimated the heritability of sets of polymorphisms. Pawitan et al. showed how many variants were needed to explain a heritability of 0.4 in 2009¹². In order to confirm that the calculated results by using the method described in the present study are consistent with those generated using other approaches, the required numbers of genetic variants under the AD model to explain a heritability of 0.4, when the prevalence of a disease is 0.01, were estimated. In this estimation the additive effect of each h_p² was considered, in the other words, the “narrow sense” heritability was tried to be accounted for. The results by the method in the present study were shown comparing with those of Pawitan et al. in Table 3. The required number of genetic variants calculated using the median of the range of variants in a category was not different from their approximation for the same category except for the common variants of category 1.

Table 3 Various categories of variants and the number of variants to explain heritability of 0.4.

Full size table

Discussion

The estimations of heritability of polymorphisms were mainly conducted for the SNPs that were found in GWAS^1,2,3,12,13. It is thought that the heritability of common diseases is due to multiple genes of small effect size and that even qualitative disorders can be interpreted simply as being the extremes of quantitative dimensions, that is, by the quantitative genetic analysis¹⁴. Recent studies demonstrated the interaction effects and the collective effects of SNPs in quantitative genetic traits^15,16,17. However, I discuss here the conventional quantitative analysis under the premise that there are simple additive effects of polymorphisms. In quantitative genetic analysis authors have assumed a latent susceptibility (or liability) that varies between individuals¹². The liability can be due to genetic and environmental factors and heritability is defined as the proportion of the variance in liability due to genetic factors. For calculation of liability that is contributed by a SNP, OR of allele frequency or OR of risk genotype for a SNP is the fundamental factor for estimating the penetrance in the analysis^12,13. Therefore when a SNP was detected only in patients (OR = +∞), the calculation is theoretically impossible in the quantitative genetic analysis. After all the quantitative effect of genes with a small effect size is being handled in the analysis and the participation of gene with such a large effect size (OR = +∞) is not assumed. Wellcome Trust Case Control Consortium et al. published in 2010 the estimation of heritability of common CNVs and they did not take into the consideration for the CNVs that were detected only in patients, either⁴. However, CNVs are sometimes detected only in the patients as shown in Table 2.

In this report a novel method to calculate heritability of a single polymorphism was shown. A trial to estimate the required numbers of genetic variants under the AD model to explain a heritability showed that the calculation results by using the method described in the present study are entirely consistent with those generated by a quantitative genetic analysis (Table 3). I did not introduce the penetrance in the calculation procedure but introduced the population attributable risk that would not be infinity when OR is +∞. By the method in the present report it was suggested that heritability of some CNVs are quite large when it was calculated under the AD model. The heredity form of CNVs is often unknown and only an OR of allele frequency for a CNV is usually available. Although by the calculation of heritability of CNVs only under the AD model, it was suggested a large part of missing heritability could be accounted for by re-evaluating the CNVs which have been already found and by searching novel CNVs with large h_p². The results of this study also suggest that CNVs might be the major genetic cause of neuropsychiatric disorders. In conclusion, CNVs were turned out to play important roles in familial aggregation of common diseases.

Methods

Calculation of genotype probabilities for a sibling

For the purpose of calculation of genotype probabilities for a sibling, an application of Beye’s method is necessary. An example of the calculation of genotype probabilities by Beye’s method for the father of the proband is shown in Table 4. As a result the posterior probability equals to the frequency of another allele (A or a) of the transmitted one (A) in the general population.

Table 4 An example of the calculation of genotype probabilities by Beye’s method when the genotype of the proband is AA.

Full size table

Then the genotype probabilities for a sibling are calculated. The calculation procedure of the genotype probabilities for a sibling was shown in Table 5. In Table 5, P1 and P2 are the posterior probabilities of genotypes of father and mother, respectively and P3 is a conditioned probability of genotype of sibling. A joint probability is the product of F, P1, P2 and P3. The summation of joint probabilities for each genotype was shown in Table 1.

Table 5 The calculation procedure of the genotype probabilities for a sibling.

Full size table

Calculation of genotype probabilities for an offspring

For calculation of genotype probabilities for an offspring the Beye’s method is not needed. The calculation procedure of the genotype probabilities for an offspring was shown in Table 6. The summation of joint probabilities for each genotype was shown in Table 1.

Table 6 The calculation procedure of the genotype probabilities for an offspring.

Full size table

An example of calculation of heritability of a polymorphism

As an example of a common disease, let us choose schizophrenia. The prevalence, P, of schizophrenia is reported as 0.01. Here, CNV (16p11.2 dup) is chosen as an example of a polymorphism¹⁸. The frequency of a risk allele in patients, u, is 0.0039 and the frequency of a risk allele in asymptomatic individuals, x, is 0. Therefore p is calculated as 0.000039 using Equation [1]. By the way, P of schizophrenia (1%) is more than +2.32635SD of a general population. The mean distance from the median in the normal distribution is calculated as +2.6652SD for the patients. The incidence rate under the autosomal dominant model of the disease in a first-degree relative, if we consider only for the effect of the CNV, is represented by Formula [7]:

The incidence rate of schizophrenia is calculated as following:

This value can be used as a recurrence risk of the disease in first-degree relatives and is more than +2.25998SD. Then heritability (h_p²) of CNV (16p11.2 dup) is calculated by Falconer’s liability threshold model and the result is as following⁶:

Additional Information

How to cite this article: Nagao, Y. Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism. Sci. Rep. 5, 17156; doi: 10.1038/srep17156 (2015).

References

Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18–21 (2008).
Article CAS PubMed Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS ADS PubMed PubMed Central Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
CAS ADS PubMed PubMed Central Google Scholar
Wellcome Trust Case Control Consortium et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).
Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl. Acad. Sci. USA 111, E455–464 (2014).
Article CAS PubMed PubMed Central Google Scholar
Falconer, D. S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. Lond. 29, 51–76 (1965).
Article Google Scholar
Fanciulli, M., Petretto, E. & Aitman, T. J. Gene copy number variation and common human disease. Clin. Genet. 77, 201–203 (2010).
Article CAS PubMed Google Scholar
Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).
Article CAS PubMed PubMed Central Google Scholar
Stefansson, H. et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008).
Article CAS ADS PubMed PubMed Central Google Scholar
Cook, E. H. Jr. & Schere, S. W. Copy number variations associated with neuropsychiatric conditions. Nature 455, 919–923 (2008).
Article CAS ADS PubMed Google Scholar
Pawitan, Y., Seng, K. C. & Magnusson, P. K. How many genetic variants remain to be discovered? PLoS One 4, e7969 (2009).
Article ADS PubMed PubMed Central Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Article CAS PubMed PubMed Central Google Scholar
Plomin, R., Haworth, C. M. & Davis, O. S. Common disorders are quantitative traits. Nat. Rev. Genet. 10, 872–878 (2009).
Article CAS PubMed Google Scholar
Bloom, J. S. et al. Finding the sources of missing heritability in a yeast cross. Nature 494, 234–237 (2013).
Article CAS ADS PubMed PubMed Central Google Scholar
Hu, T. et al. The genetic equidistance result: misreading by the molecular clock and neutral theory and reinterpretation nearly half of a century later. Sci. China Life Sci. 56, 254–261 (2013).
Article PubMed Google Scholar
Yuan, D. et al. Scoring the collective effects of SNPs: association of minor alleles with complex traits in model organisms. Sci. China Life Sci. 57, 876–888 (2014).
Article CAS PubMed Google Scholar
Rees, E. et al. Analysis of copy number variations at 15 schizophrenia-associated loci. Br. J. Psychiatry 204, 108–114 (2014).
Article PubMed PubMed Central Google Scholar
Weiss, L. A. et al. Association between microdeletion and microduplication at 16p11.2 and autism N. Engl. J. Med. 358, 667–675 (2008).
Article CAS PubMed Google Scholar
Weiss, L. A. & Arking, D. E. The Gene Discovery Project of Johns Hopkins the Autism Consortium. A genome-wide linkage and association scan reveals novel loci for autism. Nature 461, 802–808 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wang, K. et al. Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature 459, 528–533 (2009).
Article CAS ADS PubMed PubMed Central Google Scholar
Ronai, Z. et al. Glycogen synthase kinase 3 beta gene structural variants as possible risk factors of bipolar depression. Am. J. Med. Genet. B Neuropsychiatr. Genet. 165B, 217–222 (2014).
Article PubMed Google Scholar
McMahon, F. J. et al. Bipolar Disorder Genome Study (BiGS) Consortium. Meta-analysis of genome-wide association data identifies a risk locus for major mood disorders on 3p21.1. Nat. Genet. 42, 128–131 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dow, D. J. et al. ADAMTSL3 as a candidate gene for schizophrenia: gene sequencing and ultra-high density association analysis by imputation. Schizophr. Res. 127, 28–34 (2011).
Article PubMed Google Scholar
Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).
Article CAS PubMed PubMed Central Google Scholar
Walitza, S. et al. Pilot study on HTR2A promoter polymorphism, -1438G/A (rs6311) and a nearby copy number variation showed association with onset and severity in early onset obsessive-compulsive disorder. J. Neural. Transm. 119, 507–515 (2012).
Article CAS PubMed Google Scholar
Kato, T. et al. Segmental copy-number gain within the region of isopentenyl diphosphate isomerase genes in sporadic amyotrophic lateral sclerosis. Biochem. Biophys. Res. Commun. 402, 438–442 (2010).
Article CAS PubMed Google Scholar
van Es, M. A. et al. Genetic variation in DPP6 is associated with susceptibility to amyotrophic lateral sclerosis. Nat. Genet. 41, 1083–1087 (2009).
Article CAS PubMed Google Scholar
Kudo, H. et al. Frequent loss of genome gap region in 4p16.3 subtelomere in early-onset type 2 diabetes mellitus. Exp. Diabetes Res. 2011, 498460 (2011).
Article PubMed PubMed Central Google Scholar
SIGMA Type 2 Diabetes Consortium et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. JAMA 311, 2305–2314 (2014).
Schultz, S. H., North, S. W. & Shields, C. G. Schizophrenia: a review. Am. Fam. Physician. 75, 1821–1829 (2007).
PubMed Google Scholar

Download references

Acknowledgements

I am grateful to Dr. David Schlessinger of National Institute of Aging for valuable advice.

Author information

Authors and Affiliations

Department of Pediatrics, Takashimadaira Chuo General Hospital, 1-73-1 Takashimadaira, Itabashi, Tokyo, 1750082, Japan
Yoshiro Nagao
Department of Pediatrics, The University of Tokyo, Bunkyo, Tokyo, Japan
Yoshiro Nagao
Department of Clinical Genetics, Tokai University, Isehara, Kanagawa, Japan
Yoshiro Nagao

Authors

Yoshiro Nagao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.N. designed the study. Y.N. is responsible for the assessment and discussion of the obtained results and wrote the manuscript.

Ethics declarations

Competing interests

The author declares no competing financial interests.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Nagao, Y. Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism. Sci Rep 5, 17156 (2015). https://doi.org/10.1038/srep17156

Download citation

Received: 14 June 2015
Accepted: 26 October 2015
Published: 24 November 2015
DOI: https://doi.org/10.1038/srep17156
Springer Nature Limited

Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism

Abstract

Similar content being viewed by others

Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data

A method to estimate the contribution of rare coding variants to complex trait heritability

Genetic Basis of Complex Genetic Disease: The Contribution of Disease Heterogeneity to Missing Heritability

Introduction