Background

Recent advances in sequencing and genotyping technologies have allowed the association of pathological traits or diseases with common genetic variants observed in human populations. As a result of genome-wide association studies (GWASs), a large number of human diseases have been associated with single nucleotide polymorphisms (SNPs). These small genetic variants can occur in gene-coding regions or in non-coding regions (introns or intergenic regions). SNPs in a coding region can result in a codon that encodes a different amino acid (missense mutation) or in a premature stop signal (nonsense mutation). Depending on the position and/or the chemical properties of the replaced amino acid, these mutations can lead to nonfunctional proteins, causing human genetic diseases such as epidermolysis bullosa [1], sickle-cell anemia [2], mandibuloacral dysplasia [3], SOD1-mediated amyotrophic lateral sclerosis [4] and cancer [5, 6], among others. Due to the degeneracy of the genetic code (i.e., the existence of more codons than the number of different amino acids and stop signals), a SNP does not necessarily change the amino acid sequence of a translated protein. These types of substitutions are known as synonymous SNPs (sSNPs) and, for a long time, they were considered as silent mutations because it was assumed that they had no phenotypic consequences. However, strong evidence currently supports the fact that synonymous codons have phenotypic consequences, and the notion that sSNPs are innocuous has dramatically changed in the last decade. Synonymous mutations can lead to disease by means of four known mechanisms [7]. One of them is by disrupting splicing signals which, in turn, results in the loss, or gain, of one or more gene-coding regions [810]. Another mechanism affects the regulatory binding-sites of transcription factors and miRNAs [11]. For example, Crohn’s disease is caused by synonymous variant rs10065172 within the IRGM coding region, which alters a miR-196 binding site [12]. In the third mechanism, synonymous mutations can change the secondary structure of mRNA causing low protein expression levels. An emblematic example of the latter is the sSNP rs769223. This mutation results in a more stable mRNA molecule, but in lower levels of the COMT protein, leading to higher pain sensitivity [13]. The fourth alternative consists in an alteration of the ribosome-mediated translational attenuation program (i.e., a decrease in the translational rate due to ribosomal pauses) which alters protein conformation (Fig. 1a). In this sense, there is strong evidence supporting the fact that synonymous codons are not always translated in the same manner. On the contrary, some of them are translated faster and/or more accurately than others. There are many studies suggesting that ribosomal pauses schedule co-translational folding of protein domains, and determine the final protein conformation [1424]. This means that sSNPs can alter the ribosome-mediated translational attenuation program with noticeable impact on the final protein configuration and function. This could certainly lead to pathological phenotypes when considering clinically important proteins. In this regard, it has been observed that the drug-transport pump MDR1 protein changes its substrate specificity as a consequence of the sSNP rs1045642 [14], leading to multidrug resistance in cancer cells [25]. Similarly, recently it was proved that the kinetics of co-translational folding (and translation) of the gamma-B crystallin protein is controlled by synonymous codon usage [18].

Fig. 1
figure 1

Ribosome-mediated translational attenuation program and bicodons. a Graphical representation of how a sSNP can alter the ribosome-mediated translational attenuation program and, ultimately, final protein conformation. Consequently, this can affect protein function leading to pathological phenotypes. b Simplified graph showing codons (green boxes) and bicodons (red ellipses) within an mRNA sequence. Owing to the superposition of bicodons, each codon is part of two different bicodons in a given open reading frame. Alterations of the ribosome-mediated translational attenuation program, with the resulting alternative protein conformations and functions, could be encoded by bicodons, rather than by codons

Despite building evidence which suggests that synonymous codons could be a subliminal code for protein folding [2629], the operating mechanism for such a secondary code remains poorly understood. To understand how sequences can control the ribosome-mediated translational attenuation program, it is convenient to review the degeneracy of the genetic code. This redundancy offers a lot of degrees of freedom to codify proteins. Nevertheless, organisms use or explore an insignificant fraction of these available possibilities. This fact is a consequence of many biological constraints (such as protein expression levels, ribosomal proofreading errors, protein solubility, folding accuracy, protein stability) operating to optimize how to convey the correct message from genes to functional proteins. As a consequence of these constraints, there are selective pressures that promote a biased usage of synonymous codons [30, 31]. Since the recruitment of a charged tRNA to the codon in the A-site depends on the abundance of each tRNA [3235], there is a generalized idea that codon usage can modulate translational rates. Another way to modulate the translational rate consists of the non-Watson-Crick (wobble) interactions, since they are usually associated with higher dissociation rates between the mRNA and the decoding center [36]. Nevertheless, transcript translation is a sequential process in which ribosomes synthesize proteins. At each proofreading step the ribosome coordinates several simultaneous or successive tasks, such as the recruitment of a charged tRNA to the codon in the A-site, the assembly of the new residue to the nascent polypeptide, the translocation of the tRNA-mRNA complex, the dissociation of the empty tRNA from the mRNA, and its release from the E-site. Thus, the translational rate is the result of several concomitant processes with different kinetics. In particular, the dissociation process of the mRNA codon from the decoding center has been described as a rate-limiting one [37]. This implies that attenuation of the translational rate not only occurs when the ribosome awaits the entry of scarce tRNA into the A-site, but also during the mRNA translocation process. Consequently, it can be expected that selective pressure operates beyond codon usage. In fact, recent genome-wide statistical analyses have revealed that bicodons, i.e., pairs of consecutive codons, are also subject to evolutionary pressure, and biased bicodon usage has been reported as well [38, 39]. More recently, a genome-wide statistical analysis revealed that some bicodons are overused in sequences associated with highly abundant proteins, but underused when they code for lowly abundant proteins [40]. The opposite situation was also observed, as there are bicodons which are frequently used to codify lowly abundant proteins, but underused in sequences that code for highly abundant ones [40]. Based on the statistical analysis of bicodon usage in sequences associated with lowly or highly abundant proteins, the author estimated a pause propensity measure of all bicodons, and proposed that alternative protein configurations could be associated with alterations in the translational attenuation program encoded by bicodons (Fig. 1 b).

In the light of these previous findings, in this work we reexamined several human genetic diseases associated with sSNPs (compiled by Sauna et al. [41]), by means of computing the relative change in the pause propensity measure due to synonymous bicodon change. We also made the comparison with a similar measure based on the codon frequency usage, and found that in many cases the diseases of Sauna’s list can be better explained by an alteration of the ribosome-mediated translational attenuation program encoded by bicodons rather than by codons. Finally, we showed that the changes in the pause propensity of the sSNPs related to diseases are significantly greater than the pause propensity changes associated with clinically benign sSNPs.

Results

Bicodon bias in the human genome

Since we proposed that synonymous mutations programmed by bicodons can result in alterations in the timing of co-translational folding (which in turn can lead to pathological phenotypes), in this work we focused on the mechanism by which translational rate, and consequently co-translational folding too, are altered due to ribosomal pauses. In a previous article we related bicodon sequences to their pause propensities, by means of counting bicodon occurrence in coding sequences associated with highly or lowly abundant proteins. For illustrative purposes, Fig. 2 a depicts the frequency distributions associated with the bicodons that encode the amino acid pair SK. These distributions were computed using sequences from the low protein abundance (PA) sample (red bars), and from the high PA sample (orange bars). While bicodons TCAAAA, TCAAAG and TCGAAG have similar frequency usage in both sequence samples (low and high PA), many other bicodons have an evident preference for sequences related to low or high PA. In particular, the bicodons TCCAAG and AGCAAG have a high preference for coding proteins associated with low and high abundance, respectively. As shown in [40], the observed bias in bicodons cannot be explained by codon usage. This leads to questions concerning the selective forces that drive this bicodon bias.

Fig. 2
figure 2

Relationship between synonymous bicodons, protein abundances (PA) and pause propensity. a Frequency distributions associated with the bicodons that encode the amino acid pair SK, computed using sequences from the low PA sample (red bars), and from the high PA sample (orange bars). Some bicodons are more frequent within lowly abundant proteins (such as TCCAAG), some within highly abundant ones (such as AGCAAG), and other bicodons have similar frequencies in both groups of proteins (such as TCAAAA, TCAAAG and TCGAAG). b Raster plot of the p-values versus the pause propensity for all bicodons. To improve the visualization of this correlation we plot −S log10[p-value] instead of p-value, where S takes the values +1 or −1 when the bicodon has preference for sequences with low or with high PA, respectively. c Raster plot of the residual scores χ 2 versus the pause propensity and p-values for all bicodons. Small pause propensity values (orange zone) are related to bicodons with high PA, whereas large pause propensity values (red zone) are related to bicodons with low PA. Green lines represent the nine synonymous bicodon variants for the SK amino acid pair that involve a large change in pause propensity (Δ π≥0.754)

Considering that sequences which codify highly abundant proteins need to be optimized in terms of translational rate, we have interpreted this bias by assigning a key role to bicodons in programming translation pauses of the ribosomal machinery. Using the occurrence frequency in each sequence sample, we have computed the pause propensity measure π for each bicodon, and applied the Fisher’s exact test to asses the statistical significance of the usage bias. In addition, we have also computed the residual score χ 2, defined in [42], which indicates when the observed bicodon bias can be explained, or not, by the codon usage bias. Notice that while π expresses a degree of the preference of a given bicodon for coding low PA sequences instead of high PA sequences, the p-value provided by Fisher’s exact test is the probability against the null hypothesis of equal distribution in both sequence samples. The latter can be more influenced by the number of observations rather than by the net difference of bicodon occurrence between samples. Figure 2 b depicts a raster plot between π and the p-value. Note that, for a better visualization, the signed log[p-value] has been plotted.

This figure also shows that a high preference (i.e. high |π|), not necessarily implies a low p-value. However, some bicodons present a significant preference for coding low (red balloon) or high PA sequences (orange balloon).

Figure 2 c depicts two raster plots of the residual scores χ 2 versus the pause propensity and the p-values for all bicodons. Large values of the residual score (χ 2>5) indicate that the observed bias in the bicodon cannot be explained by codon usage [42]. Our working hypothesis is that bicodons with high values of the pause propensity measure, such as bicodon TCCAAG, could be linked to ribosomal pauses (red zone in Fig. 2 c). A synonymous bicodon variant consists of a change in one or more nucleotides of the hexanucleotide, that still codifies for the same amino acid pair. When one of these variants changes the associated pause propensities, it can alter the translational attenuation program leading to protein misfolding and dysfunction. In the case of our particular example, the amino acid pair SK can be codified by 12 synonymous bicodons, allowing 66 alternative synonymous bicodon variants. Nine of such synonymous variants (green lines in Fig. 2 c) involve a relative large change in the pause propensity (Δ π≥0.754). Among these, there is only one that corresponds to a single point mutation: the mutation AGTAAG ⇔ AGCAAG (with π= 0.14 and π= -0.90, respectively, which represents a change of Δ π=1.04). These bicodons have an associated p-value of 0.065 and 0.003, respectively, which means that the preference of bicodon AGCAAG for coding highly abundant proteins is significant. On the other hand, none of the synonymous bicodon variants for the KS amino acid pair involve a large change in the pause propensity. Codon usage bias per se cannot explain the lack of symmetry between amino acid pairs SK and KS, as well as with many other pairs, because both pairs have the same codons but in a different order. An analysis of all possible 420 amino acid pairs (including residue:stop) results in that there are 26718 synonymous bicodon variants, 8497 of which correspond to sSNPs. Figure 3 a displays a histogram of the pause propensity variation of all possible synonymous bicodon variants associated with the sSNPs. Figure 3 b depicts the associated Z-score, or quantile function, where the red region indicates the highest 10% pause propensity variation, i.e., those sSNPs with a pause propensity variation larger than Δ π=0.754. A similar Z-score which was calculated for the measure based on the codon frequency usage is depicted in the Additional file 1: Figure S1.

Fig. 3
figure 3

Pause propensity variation and Z-score. a Histogram of the pause propensity variation of synonymous bicodon variants associated with sSNPs. b Associated Z-score. The red region indicates the highest 10% pause propensity variation, i.e., those sSNPs with a pause propensity variation larger than Δ π=0.754

Analysis of pathological and benign sSNPs in humans

Sauna et al. compiled a list of synonymous mutations in human genes that significantly correlate with 22 human diseases [41]. This compilation only includes those sSNPs that have a significant association with disease or a clinical trait. One of these, the multidrug resistance case, has been confirmed to be associated to changes in protein conformation due to alterations in the translational attenuation program [14]. It is also known that the underlying mechanisms of Crohn’s disease, cystic fibrosis, and temporomandibular joint disorder (TMD) are not related to alterations of the translational attenuation program. On the other hand, in the remaining 18 diseases the reason why the sSNPS lead to pathological conditions remains unknown. In this paper, we analyzed the local sequence context of each compiled sSNP, by computing the pause propensity change due to the sSNP in both bicodons affected by the variation. The results of this analysis are summarized in Table 1, and further details of our analysis for each disease are given in Additional file 2: Text and Additional file 3: Table S1.

Table 1 Pause propensity variation due to sSNPs linked to clinically relevant genetic diseases or traits

We found that the variations of the pause propensity measure associated with the corresponding sSNPs, are low in the case of the following diseases: Crohn’s disease, cystic fibrosis and TMD; whereas it is high in the multidrug resistance case, in perfect agreement with the known mechanisms of these diseases. Performing the same analysis for the remaining 18 diseases with unknown subjacent mechanisms, we found that 14 of these have large associated changes (Z-score ≥0.80) in the pause propensity due to the sSNPs in one or more bicodon variants, in at least one of the genes implicated in the disease. We also observed that the affected bicodons are associated with a significant usage bias, as determined by Fisher’s exact test. For comparison we also computed a similar measure based on codon frequencies, the differential relative synonymous codon usage (RSCU). In contrast with previous observations, the differential RSCU only shows large changes in the case of cervical and vulvar cancer (rs2069763), schizophrenia (rs6277), in agreement with the bicodon analysis, and Crohn’s disease (rs10065172), where a large change in the pause propensity is not expected because, in this case, the sSNP is related to alterations in the miR-196 binding site [12]. Furthermore, the p-value provided by Fisher’s exact test associated with codons does not vary significantly (Additional file 3: Table S1). These results add suggestive evidence supporting the hypothesis that sSNPs that do not alter splicing processes neither mRNA structure, but are associated with diseases, could be involved in the alteration of the ribosome-mediated translational attenuation program. Moreover, this attenuation program could be more related to bicodons than to codons. On the other hand, we did not observe large changes in the pause propensity measure for the remaining four diseases, which could be indicating that they are not associated with alterations of the ribosome-mediated translational attenuation program. However, we found that in two of them, asthma and cervical and vulvar cancer, the corresponding sSNPs introduce rare bicodons in place of more frequent ones. In particular, in the case of asthma (rs2069763), the frequency usage of the GATAGC bicodon over both samples sequences, is 6-fold smaller than GACAGC, while in the case of cancer (rs2069763), the frequency usage of the CTGCTG bicodon is 4-fold higher than the frequency usage of CTTCTG. As these bicodons are not frequent in both the low and high PA samples, we are not able to relate this rareness with the ribosome-mediated translational attenuation program, or with other known mechanisms that lead to a pathological condition. However, it would seem that bicodon frequency could be an indicator of the sSNP pathogenicity. Finally, in the case of non-small-cell lung cancer and attention-deficit/hyperactivity disorder (ADHD), our local sequence context analysis did not show alterations associated with these sSNPs.

To further analyze synonymous mutations which correlate with human diseases, we also performed our local sequence context analysis for sSNPs which have been classified clinically as benign, as a control test. The results of this analysis are summarized in Additional file 4: Table S2. These results indicate that benign sSNPs have an associated lower change in the pause propensity measure, compared to sSNPs which we have associated to alterations of the ribosome-mediated translational attenuation program. In this sense, we used the Mann-Whitney U test to analyze the difference in pause propensities between pathological and benign sSNP groups. This test identified significant differences between both groups (p-value =0.0022). In contrast, a similar analysis with the differential codon usage did not identify significant differences between both groups (p-value =0.41).

Even though we found significant differences in the pause propensities values between pathological and benign sSNP groups, some benign sSNPs (like rs11797, rs2242244, and rs2272068, among others) have large alterations of their pause propensities (see Additional file 4: Table S2). This could mean that a large value of Δ π might be a necessary condition for an alteration of the translational attenuation program. However, this is not a sufficient condition to associate a sSNP with a disease.

Discussion

Genetic diversity allows populations to adapt to environmental changes, making them less prone to extinction. From a clinical point of view, such genetic variability also underlies the distinctive susceptibility of organisms to diseases and their differential sensitivity to toxins or drugs. SNPs are the most common form of such diversity, and account for much of the variation in genetic traits between individuals. It was originally thought that sSNPs had no implications on human health. Nevertheless, this concept has radically changed in the last ten years, since it was shown that the supposed ’silent’ mutations can have an impact on human health through various mechanisms. One of these mechanisms is protein misfolding, which occurs by means of an alteration of the ribosome-mediated translational attenuation program [15]. It was demonstrated that this mechanism operates in the misfolding of membrane protein Pgp encoded by the ABCB1 gene [14], in conformational changes in the FRQ protein of Neurospora [43], in the SufI protein of E. coli [17], and in the bovine gamma-B crystallin [18]. However, it is not known how widespread this mechanism is among the ensemble of clinically relevant genetic traits or diseases.

The main idea underlying the protein-misfolding based mechanism, is that codon usage can modulate ribosome traffic and, consequently, co-translational folding, by means of associating frequent codons with fast elongation rates and rare codons with slow elongation rates. However, the exact mechanism connecting proteostasis with codon usage remains unclear. Recently it was suggested that ribosome translocation from the codon in the P-site to the codon in the A-site depends on both codons, and not only on the codon in the A-site [40]. Consequently, the ribosome-mediated translational attenuation program would be encoded by bicodons, rather than by single codons. The author defined a pause propensity measure based on the preference of each bicodon to encode sequences associated with highly abundant or lowly abundant proteins.

In the light of this new concept, in this article we reviewed 22 genetic diseases or traits associated with synonymous mutations. It is known that the genetic condition for multidrug resistance is a consequence of the protein-misfolding based mechanism [14], whereas three other of these 22 diseases (Crohn’s disease, cystic fibrosis and TMD) are the result of different mechanisms. Nevertheless, the underlying mechanisms of the remaining diseases or traits are still unknown.

Conclusions

Our results, based on the variation of the pause propensities associated with the corresponding sSNPs, are in agreement with the underlying mechanism of the four diseases with known causes. Regarding the other 18 diseases, we have found that 14 of them could be explained by one or more alterations in the translational attenuation program in at least one of the genes implicated in the disease. These alterations, which are results of the sSNPs, consist in large changes in the pause propensity associated with bicodons. We have found that these changes are significantly greater than those related to clinically benign sSNPs. However, a large change in the pause propensity does not necessarily lead to a misfolded protein, even less so to a pathological condition. A pathological condition will be determined by how the function of the protein is affected, and by the role of the pause on protein folding, among others. On the other hand, we have found that differential RSCU has similar values in both pathological and benign sSNP groups. This suggests that synonymous codons could have a secondary role in determining translational pauses, in any case derived from the bicodons that they form.

We believe that the findings presented here, even though preliminary, shed light on genetic diseases associated with sSNPs whose underlying mechanism is based on protein misfolding due to a modification of the translational attenuation program encoded in bicodons. Future experimental studies will test our predictions helping to understand the role of bicodon preferences in proteostasis in a more conclusive manner.

Methods

The methodology that was used in this work comprised two steps: (i) the computation of the p-value provided by the Fisher’s exact test, the residual score and the pause propensity for all bicodons from human coding sequences; and (ii) the computation of the pause propensity variation, and the associated Z-score function, due to the sSNPs linked to each human genetic disease considered here. In our study, we have considered a list of genetic human diseases linked to sSNPs compiled by Sauna et al. [41]. This list comprises 50 sSNPs that have significant association with diseases, but does not include sSNPs with literature reports that identified both synonymous and non-synonymous mutations. From this list, we only considered those SNPs which are validated by 1000Genomes, and also rs769223 which is validated by HapMap. This excluded 15 sSNPs from our study, yielding 35 sSNPs in 27 genes related to 22 clinically relevant genetic traits. Additional details of the sSNPs listed in Table 1 can be found in Additional file 3: Table S1. Futhermore, we also selected all those sSNPs from single nucleotide polymorphism database (dbSNP) [44], which are clinically classified as benign, have been validated by 1000Genomes project, and have a minor allele frequency (MAF) greater than 0.05. We found 87 sSNPs following these criteria. Further details are listed in Additional file 4: Table S2.

Computation of Fisher’s exact test, residual score and pause propensity

The first step for this computation was previously developed in [40]. Briefly, the computation consisted of the frequency count of each bicodon in two samples of coding sequences, one sample associated with lowly abundant proteins and the other sample associated with highly abundant proteins. The sequences constituting each sample were selected according to their abundance, based on the genome-wide protein abundance database (Paxdb) [45, 46]. The nucleotide coding sequences corresponding to the selected proteins were downloaded from the Ensembl web site [47]. Next, we computed bicodon occurrences \(o_{ij}^{X}\) over all coding sequences belonging to a given sample X, where index i denotes the codon corresponding to the P-site, while j denotes the one corresponding to the A-site. After this, we used Fisher’s exact test to examine whether the number of occurrences of bicodons, \(o_{ij}^{L}\), observed in the sequence sample associated with low PA, was significantly different than the number of occurrences observed in the high PA sample \(o_{ij}^{H}\). The p-values of all bicodons are depicted in heat map fashion in Additional file 5: Figure S2. To improve visualization, colors are related to the signed log[p-value], i.e., the quantity −S log10[p-value], where S takes the values +1 or −1 depending on the preference of the bicodon for low or high PA sequences, respectively.

We also computed the residual score for each bicodon by removing the contribution due to the bias in codons and amino acids. The residual score χ 2 is given by [42]:

$$ \chi_{ij}^{2}= \frac{\left(o_{ij} - {\hat e}_{ij}\right)^{2}}{{\hat e}_{ij}}, $$
(1)

where \({\hat e}_{ij}= e_{ij}\times \frac {\sum ^{*}_{kl} o_{kl}}{\sum ^{*}_{kl} e_{kl}} \), (the indicates that the sum is only over bicodons that encode the same amino acid pair encoded by the bicodon ij). e ij is the expected number of occurrences of each bicodon given by \(e_{ij}= f_{i} f_{j} N_{p}/N_{tot}^{2}\), where f i is the number of occurrence of single codons i, N tot is the total number of codons in the set of sequences, and N p is the number of bicodons. We compute the residual score of each bicodon over low and high PA sample sequences separately, and will be denoted by \(\chi ^{2}_{L}\) and \(\chi ^{2}_{H}\), respectively. Thus, we use the total residual score \(\chi _{ij}^{2}=\chi ^{2}_{L} +\chi ^{2}_{H}\) to assess whether the usage bias in bicodon ij can explained by usage bias in codons i and j.

Besides p-values provided by Fisher’s exact test and residual scores χ 2, we also defined a pause propensity measure π that takes the differential frequency usage of bicodons in each sample sequences into account. This measure can be understood in terms of the RSCU index introduced by [48]. RSCU index is defined as RSCU i =s f i /N a . Where s is the number of codons encoding the same amino acid, N a is the frequency of that amino acid, and f i is the frequency of the codon i within the human genome. When RSCU i is greater, or lower, than 1, this indicates that codon i is over-, or under-, represented in the genome, respectively. Thus, we defined the pause propensity measure π as the difference between the relative synonymous bicodon usage computed over the sequence sample associated with low PA, (RSBU L), and the one computed over the sequence sample associated with high PA, (RSBU H). Mathematically,

$$ \pi_{ij}= \text{RSBU}_{ij}^{L} -\text{RSBU}_{ij}^{H} = q \left(f_{ij}^{L} - f_{ij}^{H}\right)/ N_{ap}. $$
(2)

Here, \(f_{ij}^{X}\) is the frequency of the bicodon ij computed over the sequence sample X, q is the number of bicodons encoding the same amino acid pair, and N ap is the frequency of that amino acid pair within the human genome. Thus, a large, or small, value of π ij indicates the preference of bicodon ij for coding low, or high, PA sequences, respectively. The values of π, p-value, and χ 2 for all bicodons are listed in Additional file 6: Table S3.

For the sake of comparison, we computed a similar measure to π but now in terms of codon frequency. We denoted it as differential RSCU \(\left (\text {i.e.,~} \text {RSCU}_{ij}^{L} -\text {RSCU}_{ij}^{H}\right)\). We also performed Fisher’s exact test for codons. The latter allowed us to examine whether the number of occurrences of codons observed in the low PA sample, was significantly different to the number of occurrences observed in the high PA sample.

In our study we have not considered triplets of codons. We believe that the codon associated with the E-site may have low or none effect on ribosomal transit, because this site is not involved in the translocation of the tRNA-mRNA complex. Furthermore, if triplets were to be considered it would be necessary to increase the number of sequences in our samples to a prohibitive size, which would result in poor statistical measures.

Pause propensity variation of genetic human diseases

There are two bicodon variations associated with one SNP. This occurs because as each SNP takes part in one codon, it can be part of two different bicodons, as indicated in Fig. 1 b. For each variation, bicodon i → bicodon f , we computed the pause propensity variation Δ π=π f π i . To assess the statistical significance of this variation we computed a Z-score, also known as the inverse cumulative distribution function. For each probability p in the probability distribution of a random variable X, this score function assigns value x for which Prob (Xx)=p [49]. To this end, we computed Δ π for 8497 synonymous bicodon variants associated to single point mutations. Figure 3 depicts the histogram of Δ π, where the right panel depicts the associated Z-score. The red region indicates the highest 10% pause propensity variation, i.e., those sSNPs with a pause propensity variation larger than Δ π=0.754. A similar Z-score function was computed for the differential RSCU obtained for codons. Additional file 1: Figure S1, depicts the associated Z-score function for codons. The Z-score values for bicodons and codons associated with the studied sSNPS are listed in the last column of Tables S1 and S2. These tables also list the relative change of the p-value defined as the ratio (pv f pv i )/pv i , where pv i and pv f are the signed log [p-value] of bicodon i and bicodon f, respectively.