Abstract
The average amino acid identity (AAI) is an index of pairwise genomic relatedness, and multiple studies have proposed its application in prokaryotic taxonomy and related disciplines. AAI demonstrates better resolution in elucidating taxonomic structure beyond the species rank when compared with average nucleotide identity (ANI), which is a standard criterion in species delineation. However, an efficient and easy-to-use computational tool for AAI calculation in large-scale taxonomic studies is not yet available. Here, we introduce a bioinformatic pipeline, named EzAAI, which allows for rapid and accurate AAI calculation in prokaryote sequences. The EzAAI tool is based on the MMSeqs2 program and computes AAI values almost identical to those generated by the standard BLAST algorithm with significant improvements in the speed of these evaluations. Our pipeline also provides a function for hierarchical clustering to create dendrograms, which is an essential part of any taxonomic study. EzAAI is available for download as a standalone JAVA program at http://leb.snu.ac.kr/ezaai.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Chun, J., Oren, A., Ventosa, A., Christensen, H., Arahal, D.R., da Costa, M.S., Rooney, A.P., Yi, H., Xu, X.W., De Meyer, S., et al. 2018. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int. J. Syst. Evol. Microbiol. 68, 461–466.
Chun, J. and Rainey, F.A. 2014. Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. Int. J. Syst. Evol. Microbiol. 64, 316–324.
Goris, J., Konstantinidis, K.T., Klappenbach, J.A., Coenye, T., Vandamme, P., and Tiedje, J.M. 2007. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int. J. Syst. Evol. Microbiol. 57, 81–91.
Hunter, J.D. 2007. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95.
Hyatt, D., Chen, G.L., LoCascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119.
Konstantinidis, K.T. and Tiedje, J.M. 2005a. Genomic insights that advance the species definition for prokaryotes. Proc. Natl. Acad. Sci. USA 102, 2567–2572.
Konstantinidis, K.T. and Tiedje, J.M. 2005b. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258–6264.
Lee, I., Kim, Y.O., Park, S.C., and Chun, J. 2016. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 66, 1100–1103.
Nicholson, A.C., Gulvik, C.A., Whitney, A.M., Humrighouse, B.W., Bell, M.E., Holmes, B., Steigerwalt, A.G., Villarma, A., Sheth, M., Batra, D., et al. 2020. Division of the genus Chryseobacterium: Observation of discontinuities in amino acid identity values, a possible consequence of major extinction events, guides transfer of nine species to the genus Epilithonimonas, eleven species to the genus Kaistella, and three species to the genus Halpernia gen. nov., with description of Kaistella daneshvariae sp. nov. and Epilithonimonas vandammei sp. nov. derived from clinical specimens. Int. J. Syst. Evol. Microbiol. 70, 4432–4450.
Qin, Q.L., Xie, B.B., Zhang, X.Y., Chen, X.L., Zhou, B.C., Zhou, J., Oren, A., and Zhang, Y.Z. 2014. A proposed genus boundary for the prokaryotes based on genomic insights. J. Bacteriol. 196, 2210–2215.
Richter, M. and Rosselló-Móra, R. 2009. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. USA 106, 19126–19131.
Richter, M., Rosselló-Móra, R., Oliver Glöckner, F., and Peplies, J. 2016. JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics 32, 929–931.
Rodriguez-R, L.M. and Konstantinidis, K.T. 2016. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. Peer J. Preprints 4, e1900v1.
Steinegger, M. and Söding, J. 2017. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028.
Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272.
Walter, J.M., Coutinho, F.H., Dutilh, B.E., Swings, J., Thompson, F.L., and Thompson, C.C. 2017. Ecogenomics and taxonomy of Cyanobacteria phylum. Front. Microbiol. 8, 2132.
Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., et al. 1987. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int. J. Syst. Evol. Microbiol. 37, 463–464.
Yoon, S.H., Ha, S.M., Kwon, S., Lim, J., Kim, Y., Seo, H., and Chun, J. 2017a. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 67, 1613–1617.
Yoon, S.H., Ha, S.M., Lim, J., Kwon, S., and Chun, J. 2017b. A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie van Leeuwenhoek 110, 1281–1286.
Zheng, J., Wittouck, S., Salvetti, E., Franz, C.M.A.P., Harris, H.M.B., Mattarelli, P., O’Toole, P.W., Pot, B., Vandamme, P., Walter, J., et al. 2020. A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and Leuconostocaceae. Int. J. Syst. Evol. Microbiol. 70, 2782–2858.
Acknowledgments
This research was supported by the Korean Institute of Planning and Evaluation for Technology in Food, Agriculture, Forestry and Fisheries (IPET) funded by the Ministry of Agriculture, Food and Rural Affairs (MAFRA) of Korea (grant number 918013-04-4-SB010), and the Collaborative Genome Program for Fostering New Post-Genome Industry through the National Research Foundation of Korea (NRF) funded by the Ministry of Science ICT and Future Planning (grant number NRF-2014M3C9A3063541).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no conflicts of interest to declare.
Additional information
Supplemental material for this article may be found at http://www.springerlink.com/content/120956.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kim, D., Park, S. & Chun, J. Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity. J Microbiol. 59, 476–480 (2021). https://doi.org/10.1007/s12275-021-1154-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12275-021-1154-0