Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity

Kim, Dongwook; Park, Sein; Chun, Jongsik

doi:10.1007/s12275-021-1154-0

Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity

Microbial Systematics and Evolutionary Microbiology
Open access
Published: 28 April 2021

Volume 59, pages 476–480, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Microbiology Aims and scope

Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity

Download PDF

Dongwook Kim¹^na1,
Sein Park¹^na1 &
Jongsik Chun¹

2910 Accesses
183 Citations
6 Altmetric
Explore all metrics

Abstract

The average amino acid identity (AAI) is an index of pairwise genomic relatedness, and multiple studies have proposed its application in prokaryotic taxonomy and related disciplines. AAI demonstrates better resolution in elucidating taxonomic structure beyond the species rank when compared with average nucleotide identity (ANI), which is a standard criterion in species delineation. However, an efficient and easy-to-use computational tool for AAI calculation in large-scale taxonomic studies is not yet available. Here, we introduce a bioinformatic pipeline, named EzAAI, which allows for rapid and accurate AAI calculation in prokaryote sequences. The EzAAI tool is based on the MMSeqs2 program and computes AAI values almost identical to those generated by the standard BLAST algorithm with significant improvements in the speed of these evaluations. Our pipeline also provides a function for hierarchical clustering to create dendrograms, which is an essential part of any taxonomic study. EzAAI is available for download as a standalone JAVA program at http://leb.snu.ac.kr/ezaai.

Article PDF

PARSID: a Python script for automatic analysis of local BLAST results for a rapid molecular taxonomic identification

Article Open access 24 January 2024

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries

Article Open access 30 November 2018

A large-scale evaluation of algorithms to calculate average nucleotide identity

Article 15 February 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Article CAS PubMed PubMed Central Google Scholar
Chun, J., Oren, A., Ventosa, A., Christensen, H., Arahal, D.R., da Costa, M.S., Rooney, A.P., Yi, H., Xu, X.W., De Meyer, S., et al. 2018. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int. J. Syst. Evol. Microbiol. 68, 461–466.
Article CAS PubMed Google Scholar
Chun, J. and Rainey, F.A. 2014. Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. Int. J. Syst. Evol. Microbiol. 64, 316–324.
Article PubMed Google Scholar
Goris, J., Konstantinidis, K.T., Klappenbach, J.A., Coenye, T., Vandamme, P., and Tiedje, J.M. 2007. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int. J. Syst. Evol. Microbiol. 57, 81–91.
Article CAS PubMed Google Scholar
Hunter, J.D. 2007. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95.
Article Google Scholar
Hyatt, D., Chen, G.L., LoCascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119.
Article PubMed PubMed Central Google Scholar
Konstantinidis, K.T. and Tiedje, J.M. 2005a. Genomic insights that advance the species definition for prokaryotes. Proc. Natl. Acad. Sci. USA 102, 2567–2572.
Article CAS PubMed PubMed Central Google Scholar
Konstantinidis, K.T. and Tiedje, J.M. 2005b. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258–6264.
Article CAS PubMed PubMed Central Google Scholar
Lee, I., Kim, Y.O., Park, S.C., and Chun, J. 2016. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 66, 1100–1103.
Article CAS PubMed Google Scholar
Nicholson, A.C., Gulvik, C.A., Whitney, A.M., Humrighouse, B.W., Bell, M.E., Holmes, B., Steigerwalt, A.G., Villarma, A., Sheth, M., Batra, D., et al. 2020. Division of the genus Chryseobacterium: Observation of discontinuities in amino acid identity values, a possible consequence of major extinction events, guides transfer of nine species to the genus Epilithonimonas, eleven species to the genus Kaistella, and three species to the genus Halpernia gen. nov., with description of Kaistella daneshvariae sp. nov. and Epilithonimonas vandammei sp. nov. derived from clinical specimens. Int. J. Syst. Evol. Microbiol. 70, 4432–4450.
Article CAS PubMed PubMed Central Google Scholar
Qin, Q.L., Xie, B.B., Zhang, X.Y., Chen, X.L., Zhou, B.C., Zhou, J., Oren, A., and Zhang, Y.Z. 2014. A proposed genus boundary for the prokaryotes based on genomic insights. J. Bacteriol. 196, 2210–2215.
Article PubMed PubMed Central Google Scholar
Richter, M. and Rosselló-Móra, R. 2009. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. USA 106, 19126–19131.
Article CAS PubMed PubMed Central Google Scholar
Richter, M., Rosselló-Móra, R., Oliver Glöckner, F., and Peplies, J. 2016. JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics 32, 929–931.
Article CAS PubMed Google Scholar
Rodriguez-R, L.M. and Konstantinidis, K.T. 2016. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. Peer J. Preprints 4, e1900v1.
Google Scholar
Steinegger, M. and Söding, J. 2017. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028.
Article CAS PubMed Google Scholar
Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272.
Article CAS PubMed PubMed Central Google Scholar
Walter, J.M., Coutinho, F.H., Dutilh, B.E., Swings, J., Thompson, F.L., and Thompson, C.C. 2017. Ecogenomics and taxonomy of Cyanobacteria phylum. Front. Microbiol. 8, 2132.
Article PubMed PubMed Central Google Scholar
Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., et al. 1987. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int. J. Syst. Evol. Microbiol. 37, 463–464.
Article Google Scholar
Yoon, S.H., Ha, S.M., Kwon, S., Lim, J., Kim, Y., Seo, H., and Chun, J. 2017a. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 67, 1613–1617.
Article CAS PubMed PubMed Central Google Scholar
Yoon, S.H., Ha, S.M., Lim, J., Kwon, S., and Chun, J. 2017b. A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie van Leeuwenhoek 110, 1281–1286.
Article CAS PubMed Google Scholar
Zheng, J., Wittouck, S., Salvetti, E., Franz, C.M.A.P., Harris, H.M.B., Mattarelli, P., O’Toole, P.W., Pot, B., Vandamme, P., Walter, J., et al. 2020. A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and Leuconostocaceae. Int. J. Syst. Evol. Microbiol. 70, 2782–2858.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This research was supported by the Korean Institute of Planning and Evaluation for Technology in Food, Agriculture, Forestry and Fisheries (IPET) funded by the Ministry of Agriculture, Food and Rural Affairs (MAFRA) of Korea (grant number 918013-04-4-SB010), and the Collaborative Genome Program for Fostering New Post-Genome Industry through the National Research Foundation of Korea (NRF) funded by the Ministry of Science ICT and Future Planning (grant number NRF-2014M3C9A3063541).

Author information

These authors contributed equally to this work.

Authors and Affiliations

Interdisciplinary Program in Bioinformatics, Institute of Molecular Biology & Genetics, School of Biological Sciences, Seoul National University, Seoul, 08826, Republic of Korea
Dongwook Kim, Sein Park & Jongsik Chun

Authors

Dongwook Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sein Park
View author publications
You can also search for this author in PubMed Google Scholar
Jongsik Chun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jongsik Chun.

Ethics declarations

The authors have no conflicts of interest to declare.

Additional information

Supplemental material for this article may be found at http://www.springerlink.com/content/120956.

Electronic supplementary material

Supplementary material, approximately 336 KB.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, D., Park, S. & Chun, J. Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity. J Microbiol. 59, 476–480 (2021). https://doi.org/10.1007/s12275-021-1154-0

Download citation

Received: 22 March 2021
Revised: 13 April 2021
Accepted: 15 April 2021
Published: 28 April 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s12275-021-1154-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity

Abstract

Article PDF

Similar content being viewed by others

PARSID: a Python script for automatic analysis of local BLAST results for a rapid molecular taxonomic identification

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries

A large-scale evaluation of algorithms to calculate average nucleotide identity

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Electronic supplementary material

Supplementary material, approximately 336 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity

Abstract

Article PDF

Similar content being viewed by others

PARSID: a Python script for automatic analysis of local BLAST results for a rapid molecular taxonomic identification

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries

A large-scale evaluation of algorithms to calculate average nucleotide identity

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Electronic supplementary material

Supplementary material, approximately 336 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation