Abstract
Human microbiome data are becoming extremely common in biomedical research due to the relevant connections with different types of diseases. A widespread discrete distribution to analyze this kind of data is the Dirichletmultinomial. Despite its popularity, this distribution often fails in modeling microbiome data due to the strict parameterization imposed on its covariance matrix. The aim of this work is to propose a new distribution for analyzing microbiome data and to define a regression model based on it. The new distribution can be expressed as a structured finite mixture model with Dirichlet-multinomial components. We illustrate how this mixture structure can improve a microbiome data analysis to cluster patients into “enterotypes”, which are a classification based on the bacteriological composition of gut microbiota. The comparison between the two models is performed through an application to a real gut microbiome dataset.
Chapter PDF
Similar content being viewed by others
References
Amato, K.: An introduction to microbiome analysis for human biology applications. Am. J. Hum. Biol. 29 (2017)
Arumugam, M. et al.: Enterotypes of the human gut microbiome. Nature. 473, 174–180 (2011)
Ascari, R., Migliorati, S.: A new regression model for overdispersed binomial data accounting for outliers and an excess of zeros. Stat. Med. 40(17), 3895–3914 (2021)
Chen, J., Li, H.: Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis. Ann. Appl. Stat. 7(1), 418–442 (2013)
Koeth, R. A. et al.: Intestinal microbiota metabolism of L-carnitine, a nutrient in red meat, promotes atherosclerosis. Nat. Med. 19(5) (2013)
McCullagh, P., Nelder, J. A.: Generalized Linear Models. Chapman & Hall (1989)
Migliorati, S., Ongaro, A., Monti, G. S.: A structured Dirichlet mixture model for compositional data: inferential and applicative issues. Stat. Comput. 27(4), 963–983, 2017.
Morgan, X. C., Huttenhower, C.: Human microbiome analysis. PloS Computational Biology. 8(12) (2012)
Ongaro, A., Migliorati, S.: A generalization of the Dirichlet distribution. J. Multivar. Anal. 114, 412–426 (2013)
Neal, R. M.: An improved acceptance procedure for the hybrid Monte Carlo algorithm. Tech. Rep. (1994)
Ongaro, A., Migliorati, S., Ascari, R.: A new mixture model on the simplex. Stat. Comput. 30(4), 749–770 (2020)
Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F., Liang, S., Zhang, W., Guan, Y., Shen, D., Peng, Y.: A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 490 (2012)
Stan Development Team: Stan Modeling Language Users Guide and Reference Manual (2017)
Turnbaugh, P. J. et al.: A core gut microbiome in obese and lean twins. Nature. 457 (2009)
Vehtari, A., Gelman, A., Gabry, J.: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27(5), 1413–1432 (2017)
Wadsworth, W. D., Argiento, R., Guindani, M., Galloway-Pena, J., Shelburne, S. A., Vannucci, M.: An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data. BMC Bioinformatics. 18(94) (2017)
Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn. Tech. 14(1), 867–897 (2013)
Wu., G. D. et al.: Linking long-term dietary patterns with gut microbial enterotypes. Science. 334, 105–109 (2011)
Xia, F., Chen, J., Fung, W. K., Li, H.: A logistic normal multinomial regression model for microbiome compositional data analysis. Biometrics. 69(4), 1053–1063 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Ascari, R., Migliorati, S. (2023). A New Regression Model for the Analysis of Microbiome Data. In: Brito, P., Dias, J.G., Lausen, B., Montanari, A., Nugent, R. (eds) Classification and Data Science in the Digital Age. IFCS 2022. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-031-09034-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-09034-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09033-2
Online ISBN: 978-3-031-09034-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)