Amino acid compositions contribute to the proteins’ evolution under the influence of their abundances and genomic GC content

Du, Meng-Ze; Liu, Shuo; Zeng, Zhi; Alemayehu, Labena Abraham; Wei, Wen; Guo, Feng-Biao

doi:10.1038/s41598-018-25364-1

Amino acid compositions contribute to the proteins’ evolution under the influence of their abundances and genomic GC content

Article
Open access
Published: 09 May 2018

Volume 8, article number 7382, (2018)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Amino acid compositions contribute to the proteins’ evolution under the influence of their abundances and genomic GC content

Download PDF

Meng-Ze Du¹,
Shuo Liu¹,
Zhi Zeng¹,
Labena Abraham Alemayehu¹,
Wen Wei² &
…
Feng-Biao Guo^1,3,4

5303 Accesses
15 Citations
Explore all metrics

Abstract

Inconsistent results on the association between evolutionary rates and amino acid composition of proteins have been reported in eukaryotes. However, there are few studies of how amino acid composition can influence evolutionary rates in bacteria. Thus, we constructed linear regression models between composition frequencies of amino acids and evolutionary rates for bacteria. Compositions of all amino acids can on average explain 21.5% of the variation in evolutionary rates among 273 investigated bacterial organisms. In five model organisms, amino acid composition contributes more to variation in evolutionary rates than protein abundance, and frequency of optimal codons. The contribution of individual amino acid composition to evolutionary rate varies among organisms. The closer the GC-content of genome to its maximum or minimum, the better the correlation between the amino acid content and the evolutionary rate of proteins would appear in that genome. The types of amino acids that significantly contribute to evolutionary rates can be grouped into GC-rich and AT-rich amino acids. Besides, the amino acid with high composition also contributes more to evolutionary rates than amino acid with low composition in proteome. In summary, amino acid composition significantly contributes to the rate of evolution in bacterial organisms and this in turn is impacted by GC-content.

Translation Comes First: Ancient and Convergent Selection of Codon Usage Bias Across Prokaryotic Genomes

Article 26 September 2022

The Influence of Codon Usage, Protein Abundance, and Protein Stability on Protein Evolution Vary by Evolutionary Distance and the Type of Protein

Article 11 February 2022

The fundamental tradeoff in genomes and proteomes of prokaryotes established by the genetic code, codon entropy, and physics of nucleic acids and proteins

Article Open access 12 December 2014

Introduction

The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s¹. The rate of protein evolution is generally thought to reflect the relative importance of selection and genetic drift, and is used to identify selective forces acting on genomes². Protein sequence evolution has been investigated at the level of the DNA codons by examining amino acids and codons³, Rates of protein evolution are usually estimated by calculating the number of amino acid substitutions per site between a pair of orthologous proteins¹. The ratio of the number of nonsynonymous nucleotide substitutions per nonsynonymous site (Ka) to the number of synonymous nucleotide substitutions per synonymous site (Ks) is frequently computed to assay the strength and direction of selection^4,5,6. Theoretical and empirical analyses have demonstrated the importance of selection against errors in molecular and cellular processes, i.e., purifying selection in protein evolution. Determining the selection pressures of protein evolution that have shaped genetic variation forms a major part of many studies of molecular evolution.

Amino acids tend to be gained and lost with a universal trend during protein evolution⁷, and the biased amino acid usage is thought to be related to the molecular weights, protein structure, and the cost selection for synthesis^8,9,10. However, our knowledge on the relationship of amino acid composition and the substitution rate are not consistent. Graur reported a highly significant correlation between the nonsynonymous substitution rate (Ka) and amino acid composition in mammalian proteins (R²: ~0.38), and thus he proposed that composition and changeability of amino acids are the main factor determining evolutionary rate rather than other factors such as functionality¹¹. Conversely, another work concluded that rates of protein evolution was only weakly affected by amino acid compositions (R²: <0.10)¹². By constructing an integrated probabilistic modeling approach in Saccharomyces cerevisiae, it was observed that amino acid composition together with protein abundance strongly contribute to the models of predicting slowly evolving proteins¹³. Recent research investigated the evolutionary patterns of amino acids in eight primates and concluded that amino acid usage was an important factor for protein evolution¹⁴. Although these findings hint at a close relationship between amino acid composition and evolution, direct computational and experimental results have been rare for bacterial organisms.

Thus, we systematically investigated the relationship between amino acid composition and evolutionary rates in 273 bacteria belonging to 18 phyla. The evolutionary rate Ka/Ks, which reflects the type and extent of selection pressure acting on genes¹⁵, was supposed to correlate with the compositions of 20 amino acids. In this work, we constructed multiple linear regression models (MLR) between evolutionary rates and amino acid composition for these bacterial species using ridge regression. Moreover, significant linear models were adopted to further analyze the contribution of each amino acid to evolutionary rate. Comparing individual contributions of the 20 amino acids, we found that the genomic GC content and amino acid richness had effects on the contribution of amino acid compositions to the evolutionary rates. This work confirmed the correlation between amino acid composition and Ka/Ks in bacteria and revealed that GC content and richness of amino acids influence their contributions to evolutionary rates.

Materials and Methods

Related species

There are 273 genome pairs, belonging to 18 phyla (Fig. 1A), used to calculate the evolutionary rates of proteins (Table S1). For organisms with multiple chromosomes, the maximal chromosome was used. The following steps were applied to determine the reference genome. We downloaded all bacterial genomes in year 2017 from Genbank, which is a comprehensive public database for nucleotide sequences and biological annotation¹⁶. Organisms can be clustered into species, genuses, families, then orders, then classes, then phyla, and finally into kingdoms. We employed the taxonomy of bacteria to determine the best reference organisms: for genomes belonging to the same genus (if none, then the same family/order), we choose the genome having similar genome size as the reference of the target genomes. For example, the reference of NC_000117 (Chlamydia trachomatis D/UW-3/CX) is NC_015408 which belongs to the same genus Chlamydia containing more than 50 organisms.

Evolutionary rates

Orthologous gene pairs between each genome pair were identified based on reciprocal best hits using the protein-protein BLAST service Blastp (https://blast.ncbi.nlm.nih.gov/Blast.cgi)^17,18 with criteria of E <10⁻⁵, 80% minimum residues that could be aligned, and 30% identity. Protein sequences encoded by identified orthologous gene pairs were aligned with ClustalW¹⁹, and then back-translated into nucleotide sequences based on their original sequences. Numbers of substitutions per nonsynonymous site (Ka) and numbers of substitutions per synonymous site (Ks) were calculated following Yang’s definition using the PAML package^20,21 with default parameters.

Frequency of optimal codons

Codon usage bias for each gene was measured by the frequency of optimal codons (Fop). The optimal codons were described as that the most frequently used ones in a set of highly expressed genes for a certain species^22,23,24. The ribosome proteins were used as the referenced highly expressed proteins. For a total of 273 organisms, Fops ranged from 0.36 (genes with the same codon bias) to 1.0 (genes with a strong codon preference).

Protein abundance

The data of protein abundance was acquired from Paxdb (http://pax-db.org) which reprocessed, unified, quality-scored, and then integrated those qualification data²⁵. The integrated data for five model organisms were downloaded: Bacillus subtilis subsp. subtilis str. 168 (B. subtilis), Desulfovibrio vulgaris str. Hildenborough (D. vulgaris), Escherichia coli str. K-12 substr. MG1655 (E. coli), Mycobacterium tuberculosis H37Rv (M. tuberculosis), Neisseria meningitidis MC58 (N. meningitidis), and Streptococcus pyogenes M1 GAS (S. pyogenes).

Statistic methods

To avoid multi-collinearity, we adopted the ridge regression package in R language to choose variables^26,27. Then the multivariate linear regression models were constructed using the chosen variables and evolutionary rates. Finally, we acquired 273 linear models with significant P values (<0.05). Furthermore, we also tried the principal component regression and the final total decision coefficients are close to the results of ridge regression we adopted in this paper, while the latter method can help us to refine to special amino acids.

Results and Discussion

Evolutionary rates significantly correlate with amino acid compositions

Above mentioned inconsistent results have been reported on the correlation between the amino acid compositions and evolutionary rates in eukaryotes. In this work, we investigated the relationship between amino acid composition and evolutionary rate Ka/Ks for bacteria. The Ka/Ks can be viewed as the outcome variable in a regression setting, with the amino acid compositions as predictors. Then, our aim is to construct a model to predict the evolutionary rate using the amino acid compositions for proteins in an organism. However, the high-dimensional data and the complex relationships between amino acids could influence the steady state of the model. Ridge regression is a means of estimating regression coefficients when data are high-dimensional and/or contain correlated variables^27,28, and it can be used to obtain stable parameter estimates through guiding the variable selection (to select those amino acid compositions which have significant effects on evolutionary rate).

These analyses were applied to 273 bacterial proteomes to construct linear models. All multivariate linear regression models (MLR) are significant with the average total decision coefficient (R²) being 0.215 (P < 0.05), which means the amino acid compositions can explain average 21.5% of evolutionary rates’ variation (Fig. 1B, Table S1). These results show that amino acid composition generally correlate with the evolutionary rate in bacteria.

For some organisms, the coefficients of determination (R²) are very low. For example, Dehalococcoides CBDB1 uid58413, Escherichia coli ED1a uid59379 and Bordetella bronchiseptica RB50, the corresponding R² are 0.0238, 0.0420 and 0.0472. However, some organisms have higher coefficients of determination (R²), such as Buchnera aphidicola Tuc7, Buchnera aphidicola 5 A (Acyrthosiphon pisum), Buchnera aphidicola cinara tujafilinat, the R² are 0.4384, 0.4553 and 0.5206. Comparing with the genomes with lower R², the three genomes are small sized and AT-rich. Hence, genomic GC content and genome size may influence R² (or contribution strength) of that genome. Further linear regression was performed between R² and GC content/genome size, and the results showed that R² correlate with GC content/genome size (Fig. 1C,D). And using different reference genome will acquire different value of Ka, Ks and Ka/Ks, and the R² for the MLR may change. For example, we choose Bacillus subtilis spizizenii W23 uid51879 as the reference organism of Escherichia coli ED1a uid59379, the R² is lifted from 0.0420 to 0.1671. The 10% genomes with the highest R² own a slightly higher Ka (0.2029 vs 0.2250), Ks (2.5407 vs 2.9360) and Ka/Ks (0.0996 vs 0.1016) than that of the 10% genomes with the lowest R² (Student’s t test: P < 2.2e-16). However, the difference of R² for MLR caused by the choice of reference genome is unavoidable and under control. In this work, the impact was minimized by choosing the reference genomes based on the phylogenetic relationship of organisms, and restrictions on the average evolutionary rate Ka/Ks (<5).

Other factors may constrain the correlation between the amino acid compositions and evolutionary rate. Expression level has been identified as a leading determinant of the protein variation in the rate of sequence evolution among genes encoded in the same genome²⁹. It is previously reported that expression level strongly predicts the evolutionary rates (Ka) for yeast proteins³⁰. Additionally, a limited but statistically significant negative correlation between Fop and Ka/Ks was reported, which is indicative of a link between selection on protein sequence and selection on codon usage³¹. For five model organisms, two MLRs are constructed: one is for evolutionary rates and multi amino acid compositions, and the other is for the evolutionary rates and variables including amino acid compositions, Fop and abundance. Comparing with the first MLRs, the second MLRs have increased/decreased R² for the five organisms (Table 1). The average evolutionary rates for proteins in these genomes ranged from 0.1~0.3 (Fig. 1E). The results showed that expression level and Fop tend to negatively correlate with evolutionary rate which is consistent with the results of previous researches¹. The linear models showed that some amino acids contribute positively to the Ka/Ks, while other amino acids contribute negatively to the Ka/Ks (Table S1). Finally, our results showed that the amino acid compositions can predict more of the variance among evolutionary rates than does protein abundance and Fop.

Table 1 The linear models for evolutionary rates and abundance/Fop/amino acid compositions.

Full size table

Further comparative analyses were performed to check whether organisms with close evolutionary relationship have similar tendency of amino acid contributions to evolutionary rates. To acquire the predictive power of constructed regression models, we compared the linear models for organisms from one genus. The model organism Escherichia coli have 29 different strains can be downloaded from Genbank¹⁶. When take NC_011740 as reference genome, we found that amino acid Cys (C) and Ser (S) positively contribute to evolutionary rates, while Gly (G) negatively contribute to evolutionary rates in the built linear models for all these 29 strains. When take NC_014479 as reference genome, we found that amino acid Leu (L) and Val (V) positively contribute to evolutionary rates, while Tyr (Y) negatively contribute to evolutionary rates in the built linear models for all these 29 strains (Table S2). The other genomes which belong to one genus also show similar tendency of amino acid contributions to evolutionary rates (Table S1). To some degree, we can predict the contribution of each amino acid to the evolutionary rate for closely related species with the constructed regression models.

GC content influences the contribution of amino acids to evolutionary rates

The base composition of genomic sequences varies widely. GC-content, determining the underlying causes (selective or neutral) of base variations, is a major issue in genetics^32,33. Bohlin et al. reported that amino acid usage is strongly linked with genomic AT content³⁴ and Zhou et al. demonstrated that base usage, codon usage patterns and amino acid usage change with GC content with a linear correlation in prokaryotic organisms³⁵. Thus, GC content plays a role both in the evolution of proteins and shaping amino acid composition. The abovementioned result showed that the R² varied among different genomes and could the GC content be one of the controlling factors?

In this work, the coefficients of determination R² for the MLR represent the intensity of the amino acid contribution to the evolutionary rates. Through the scatter plot where the x lab is GC content and the y lab is the coefficient of determination for MLR, we found two distinct patterns between GC-rich genomes and AT-rich genomes (Fig. 2A). For GC-rich genomes, the coefficient of determination grows along with the GC-content, while in AT-rich ones it grows with the AT-content. These results indicate that amino acids contribute more to evolutionary rates in organisms with either high GC content or high AT content.

The above results may be caused by biased compositions of amino acids encoded by codons with different GC contents. We thus extract those AAs which positively contribute to the evolutionary rate of the 273 linear models. The histograms of these amino acids show that GC-rich organisms and AT-rich organisms display different categories of AAs which contribute to evolutionary rates (Fig. 2B). The amino acids encoded by AT-rich codons (such as Phe, Ile, Tyr, Asn, Lys and so on) tend to positively contribute to MLRs for AT-rich organisms and negatively contribute to MLRs for GC-rich organisms. The amino acids encoded by GC-rich codons (Pro, Gly, Arg and Ala) tend to positively contribute to MLRs for GC-rich organisms and negatively contribute to MLRs for AT-rich organisms. The ratios of AT content to GC content for all codons encoding these amino acids are: Asn-5:1, Ile-5:1, Phe-5:1, Lys-5:1, Tyr-5:1, Ala-1:5, Pro-1:5, Arg-1:5, Gly-1:5. The results showed that in GC-rich organisms, amino acids owing more GC-rich codons contributed to the evolutionary rates more; while in AT-rich organisms, AAs owing more AT-rich codons contributed to the evolutionary rates more.

If the GC content largely determine the correlation between amino acids and evolutionary rates, the content of these amino acids among organisms should also change with the GC contents. Thus, the average amino acid compositions of the investigated 273 organisms were calculated. The four GC-rich amino acids Ala, Pro, Arg and Gly (A, P, R and G) positively correlate with GC contents of the organisms. The five AT-rich amino acids Phe, Ile, Tyr, Asn and Lys (F, I, Y, N and K) positively correlate with AT contents of the organisms (Fig. 2C). All these results supported that amino acid usage contribute to the evolutionary rates under the influence of GC-content.

The content of special amino acid influences its contribution to evolutionary rates

From the above results, we observed that amino acid composition is partially influenced by GC content. But, the GC content is not the only factor influence amino acid compositions. We also observed that amino acids Asp and Glu (D and E) tend to be conservative in most organisms, while Leu and Ser (L and S) always contribute to the MLRs in most organisms. For organisms with GC content in range 45~55%, compositions of amino acid Leu and Ser positively correlate with the evolutionary rates, while the compositions of amino acids Asp and Glu negatively correlate with the evolutionary rates (Fig. 3A). In contrast, a significantly higher content of Leu than Asp and Glu were observed (Fig. 3B). Although the content of Ser is not higher than the content of Asp and Glu, the Ser is the precursor of amino acids Gly and Cys, according to the amino acid biosynthetic pathways⁹, which may result in that higher correlation between compositions of Ser and the evolutionary rates. If richer amino acids tend to contribute more to evolutionary rates than the rest, then these results can be explained.

To further prove that the richness of amino acid influences their contribution to evolutionary rates, we investigated the contributions of richest amino acids. Among the 273 linear models, the richest amino acids positively contribute in 71% of inspected organisms, and the rarest amino acids only positively contribute in 34% of those organisms. In 65 linear models where the richest and rarest amino acids both significantly contribute, the contributions of richest amino acids are lower than that of the rarest amino acids only in 9 organisms. Additionally, the contributions of the richest amino acids in 273 organisms not only higher than the rarest amino acids (Students’ t test: P < 2.2e-16), they also are higher than the average of the rest 19 types of amino acids (Students’ t test: P < 2.2e-16). For these organisms, the richest amino acids for the 52%, 30%, 14% and 4% genomes are Leu, Ala, Ile and Lys. This help to explain why in most organisms the compositions of amino acid Leu positively contribute to the evolutionary rates. Next, comparison of the contributions to evolutionary rates by the richest AAs and the rest 19 AAs show that richest AAs absolutely contribute more than the rest (Fig. 3C).

Our result showed that the higher the usage of special amino acid in proteome, the higher contributions the amino acid has to the evolutionary rate for this organism. The metabolic efficiency may result in the variation of amino acid composition. However, the statistical aberrations also cannot be ignored through magnifying the contribution of richest amino acids. This finding can be applied to understand, predict and even affect the evolution of leucine-rich genes³⁶, alanine-rich genes³⁷ and others.

Discussion

Understanding the sequence and structure of proteins is important in understanding genome evolution. We looked into the relation between amino acid compositions and evolutionary rates in multiple organisms. Our results supported the conclusion that amino acid composition generally correlates with the evolutionary rates in bacterial organisms. Through ridge regression and multivariate linear regression, we acquired the contributions of different amino acids. Connecting these determination coefficients of MLR with the GC content and the rank of amino acid composition, we observed a trend for the co-variation of amino acid composition and evolutionary rate.

The result showed that the closer the GC-content of one genome to its maximum or its minimum, the better the correlation between the amino acid content and the evolutionary rate of proteins for that genome. Why evolutionary rates in genomes with extreme GC content have better correlation with amino acid compositions? In this paper we measured the evolutionary rate through the ratio between Ka and Ks for genes. Any increase/decrease of each of the two indexes can cause the ratio change. A prominent result showed that for GC-rich genomes, the usage of GC-rich amino acids (such as Ala, Pro, Arg and Trp) increase (Fig. 2C). However, the increase in the usage of these GC-rich amino acids under the influence of GC pressure may be limited by negative selection. If it is allowed, all other amino acids are turning to GC-rich amino acids (such as Ala, Pro, Arg and Trp) throughout the whole sequence. The Ka/Ks also represent the selective pressure^38,39,40, and thus we compared the selective pressure for genomes with different GC-content. Average Ka/Ks for each genome was computed, and GC-rich organism (GC content > 64%) and AT-rich genomes (GC < 32%) generally have lower Ka/Ks than Genomes with GC content between 47% and 53% (Fig. 4, students’ t test: P = 4.111e-07). The decrease of Ka/Ks ratio in GC-richest or AT-richest genomes possibly reflects a stronger negative selection for genes of them. But it was also caused by the decrease of mutation pressure for the whole genomes. This discovery can help us to understand how the GC content influences the amino acid composition during evolution.

Those richest amino acids generally positively contribute to evolutionary rates and their corresponding contributions always are higher than the others. We also found that Leu is so abundant in proteins, while Cys and Trp are so rare in proteins. There is no direct report to explain this observation. It can be understood from the view of energy efficiency. Chen et al. reported that efficiency trade-offs drive nucleotide usage in transcribed regions, which means cheaper nucleotides encode more expensive amino acids⁴¹. The synthesis cost of A + G > U + C and the Leu codons have more U + C. Thus, it has moderate nucleotide cost. In addition, Leu is one amino acid with moderate energy cost, while Trp is the most expensive amino acid⁴². The protein structure may also result in high abundance of Leu and low abundance of Trp. For example, the common leucine-rich repeat motif such as leucine zipper needs enough Leu⁴³. Because of containing sulfur, the Cys composition in proteins is limited⁴⁴. Furthermore, whether the algorithm for the alignment might contribute something into the obtained results that lower abundance amino acids have lower contributions to the evolutionary rates? Usually such rare amino acids as Cys and Trp have the higher score for the alignment than Leu. Probably, Cys and Trp rich proteins are not just evolve slower, but they are also aligned better⁴⁵. More direct evidences are needed to explain the abundance of these amino acids and their evolution mechanism, which can finally help us understand the effects of amino acid composition to evolutionary rates.

References

Zhang, J. & Yang, J. R. Determinants of the rate of protein sequence evolution. Nat Rev Genet 16, 409–420 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pál, C., Papp, B. & Lercher, M. J. An integrated view of protein evolution. Nature reviews. Genetics 7, 337 (2006).
Article PubMed Google Scholar
Kosiol, C., Holmes, I. & Goldman, N. An empirical codon model for protein sequence evolution. Mol Biol Evol 24, 1464–1479 (2007).
Article CAS PubMed Google Scholar
Hurst, L. D. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet 18, 486 (2002).
Article PubMed Google Scholar
Nei, M. & Kumar, S. Molecular evolution and phylogenetics. (Oxford university press, 2000).
Yang, Z., Wong, W. S. & Nielsen, R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol 22, 1107–1118 (2005).
Article CAS PubMed Google Scholar
Jordan, I. K. et al. A universal trend of amino acid gain and loss in protein evolution. Nature 433, 633–638 (2005).
Article ADS CAS PubMed Google Scholar
Seligmann, H. Cost-minimization of amino acid usage. J Mol Evol 56, 151–161 (2003).
Article ADS CAS PubMed Google Scholar
Akashi, H. & Gojobori, T. Metabolic efficiency and amino acid composition in the proteome of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA 99, 3695–3700 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Das, S., Paul, S., Chatterjee, S. & Dutta, C. Codon and Amino Acid Usage in Two Major Human Pathogens of Genus Bartonella — Optimization Between Replicational-Transcriptional Selection, Translational Control and Cost Minimization. Dna Research An International Journal for Rapid Publication of Reports on Genes & Genomes 12, 91 (2005).
Article CAS Google Scholar
Graur, D. Amino acid composition and the evolutionary rates of protein-coding genes. J Mol Evol 22, 53–62 (1985).
Article ADS CAS PubMed Google Scholar
Tourasse, N. J. & Li, W. H. Selective constraints, amino acid composition, and the rate of protein evolution. Mol Biol Evol 17, 656–664 (2000).
Article CAS PubMed Google Scholar
Xia, Y., Franzosa, E. A. & Gerstein, M. B. Integrated assessment of genomic correlates of protein evolutionary rate. PLoS Comput Biol 5, e1000413 (2009).
Article ADS PubMed PubMed Central Google Scholar
Liu, H., Xie, Z., Tan, S., Zhang, X. & Yang, S. Relationship between amino acid usage and amino acid evolution in primates. Gene 557, 182–187 (2015).
Article CAS PubMed Google Scholar
Mugal, C. F., Wolf, J. B. & Kaj, I. Why time matters: codon evolution and the temporal dynamics of dN/dS. Mol Biol Evol 31, 212–231 (2014).
Article CAS PubMed Google Scholar
Benson, D. A. et al. GenBank. Nucleic Acids Research 45, D37–D42 (2017).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J Mol Biol 215, 403–410 (1990).
Article CAS PubMed Google Scholar
McGinnis, S. & Madden, T. L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nuleic Acids Res 32, W20–W25 (2004).
Article CAS Google Scholar
Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).
Article CAS PubMed Google Scholar
Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Bioinformatics 13, 555–556 (1997).
Article CAS Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Duret, L. & Mouchiroud, D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 96, 4482–4487 (1999).
Article ADS CAS PubMed PubMed Central Google Scholar
Qin, W. Y. et al. New insights into the codon usage patterns of the bactericidal/permeability-increasing (BPI) gene across nine species. Gene 616, 45–51 (2017).
Article CAS PubMed Google Scholar
Stenico, M., Lloyd, A. T. & Sharp, P. M. Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res 22, 2437–2446 (1994).
Article CAS PubMed PubMed Central Google Scholar
Wang, M., Herrmann, C. J., Simonovic, M., Szklarczyk, D. & Mering, C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell‐lines. Proteomics 15, 3163–3168 (2015).
Article CAS PubMed Google Scholar
Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. Journal of computational and graphical statistics 5, 299–314 (1996).
Google Scholar
Cule, E. & De Iorio, M. A semi-automatic method to guide the choice of ridge parameter in ridge regression. arXiv preprint arXiv:1205.0686 (2012).
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000).
Article MATH Google Scholar
Park, C., Chen, X., Yang, J. R. & Zhang, J. Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 110, E678–686 (2013).
Article CAS PubMed PubMed Central Google Scholar
Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 102, 14338–14343 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Ran, W., Kristensen, D. M. & Koonin, E. V. Coupling Between Protein Level Selection and Codon Usage Optimization in the Evolution of Bacteria and Archaea. Mbio 5, 00956–00914 (2014).
Article Google Scholar
Lassalle, F. et al. GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet 11, e1004941 (2015).
Article PubMed PubMed Central Google Scholar
Khrustalev, V. V., Khrustaleva, T. A., Sharma, N. & Giri, R. Mutational Pressure in Zika Virus: Local ADAR-Editing Areas Associated with Pauses in Translation and Replication. Frontiers in Cellular & Infection Microbiology 7, 44 (2017).
Article Google Scholar
Bohlin, J., Brynildsrud, O., Vesth, T., Skjerve, E. & Ussery, D. W. Amino acid usage is asymmetrically biased in AT- and GC-Rich microbial genomes. Plos One 8, e69878 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, H. Q., Ning, L. W., Zhang, H. X. & Guo, F. B. Analysis of the Relationship between Genomic GC Content and Patterns of Base Usage, Codon Usage and Amino Acid Usage in Prokaryotes: Similar GC Content Adopts Similar Compositional Frequencies Regardless of the Phylogenetic Lineages. Plos One 9, e107319 (2014).
Article ADS PubMed PubMed Central Google Scholar
Dreyer, H. C. et al. Leucine-enriched essential amino acid and carbohydrate ingestion following resistance exercise enhances mTOR signaling and protein synthesis in human muscle. American journal of physiology. Endocrinology and metabolism 294, E392–E400 (2008).
Article CAS PubMed Google Scholar
Harlan, D. M. et al. The human myristoylated alanine-rich C kinase substrate (MARCKS) gene (MACS). Analysis of its gene product, promoter, and chromosomal localization. Journal of Biological Chemistry 266, 14399–14405 (1991).
CAS PubMed Google Scholar
Dias, L. M. et al. Genomic Architecture of the Two Cold-Adapted Genera Exiguobacterium and Psychrobacter: Evidence of Functional Reduction in the Exiguobacterium antarcticum B7 Genome. Genome Biology and Evolution 10, 731–741 (2018).
Article PubMed PubMed Central Google Scholar
Lynch, M. et al. Genetic drift, selection and the evolution of the mutation rate. Nature Reviews Genetics 17, 704–714 (2016).
Article CAS PubMed Google Scholar
Bashir, T. et al. Hybridization Alters Spontaneous Mutation Rates in a Parent-of-Origin-Dependent Fashion in Arabidopsis. Plant Physiology 165, 424–437 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chen, W. H., Lu, G., Bork, P., Hu, S. & Lercher, M. J. Energy efficiency trade-offs drive nucleotide usage in transcribed regions. Nature Communications 7, 11334 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Akashi, H. & Gojobori, T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proceedings of the National Academy of Sciences of the United States of America 99, 3695–3700 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Wei, T. et al. LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs). BMC Structural Biology 8, 47–47 (2008).
Article PubMed PubMed Central Google Scholar
Kimura, Y. et al. 3-Mercaptopyruvate sulfurtransferase produces potential redox regulators cysteine- and glutathione-persulfide (Cys-SSH and GSSH) together with signaling molecules H(2)S(2), H(2)S(3) and H(2)S. Scientific Reports 7, 10459 (2017).
Article ADS PubMed PubMed Central Google Scholar
Mills, L. J. & Pearson, W. R. Adjusting scoring matrices to correct overextended alignments. Bioinformatics 29, 3007–3013 (2013).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Tianjin Natural Science Foundation of China [16JCQNJC09600], National Natural Science Foundation of China [31470068, 31501063] and the Fundamental Research Funds for the Central Universities of China [ZYGX2015J144, ZYGX2015Z006 and ZYGX2016J117].

Author information

Authors and Affiliations

School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
Meng-Ze Du, Shuo Liu, Zhi Zeng, Labena Abraham Alemayehu & Feng-Biao Guo
School of Life Sciences, Chongqing University, Chongqing, China
Wen Wei
Centre for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
Feng-Biao Guo
Key Laboratory for Neuroinformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
Feng-Biao Guo

Authors

Meng-Ze Du
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Labena Abraham Alemayehu
View author publications
You can also search for this author in PubMed Google Scholar
Wen Wei
View author publications
You can also search for this author in PubMed Google Scholar
Feng-Biao Guo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.B.G. conceived the study. M.Z.D. performed the bioinformatics analyses and S.L. double-checked them. Z.Z., L.A.A. and W.W. took part in downloading the data and double-checking some parts of the results. M.Z., W.W. and F.B.G. wrote, and F.B.G. and W.W. revised the paper. All authors approved the paper.

Corresponding authors

Correspondence to Wen Wei or Feng-Biao Guo.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Du, MZ., Liu, S., Zeng, Z. et al. Amino acid compositions contribute to the proteins’ evolution under the influence of their abundances and genomic GC content. Sci Rep 8, 7382 (2018). https://doi.org/10.1038/s41598-018-25364-1

Download citation

Received: 30 October 2017
Accepted: 16 April 2018
Published: 09 May 2018
DOI: https://doi.org/10.1038/s41598-018-25364-1
Springer Nature Limited

This article is cited by

The genome of Candidatus phytoplasma ziziphi provides insights into their biological characteristics
- Chaoling Xue
- Yao Zhang
- Jin Zhao
BMC Plant Biology (2023)
Detection, characterization and evolutionary aspects of S54LP of SP (SAP54 Like Protein of Sesame Phyllody): a phytoplasma effector molecule associated with phyllody development in sesame (Sesamum indicum L.)
- Amrita Singh
- Suman Lakhanpaul
Physiology and Molecular Biology of Plants (2020)
Metabolic diversification of anaerobic methanotrophic archaea in a deep-sea cold seep
- Wen-Li Li
- Yu-Zhi Wu
- Yong Wang
Marine Life Science & Technology (2020)

Amino acid compositions contribute to the proteins’ evolution under the influence of their abundances and genomic GC content

Abstract

Similar content being viewed by others

Translation Comes First: Ancient and Convergent Selection of Codon Usage Bias Across Prokaryotic Genomes

The Influence of Codon Usage, Protein Abundance, and Protein Stability on Protein Evolution Vary by Evolutionary Distance and the Type of Protein

The fundamental tradeoff in genomes and proteomes of prokaryotes established by the genetic code, codon entropy, and physics of nucleic acids and proteins

Introduction