Title: A Bayesian zero-inflated Dirichlet-multinomial regression model for multivariate compositional count data
Authors: Matthew Koslovsky - Colorado State University (United States) [presenting]
Abstract: The Dirichlet-multinomial (DM) distribution plays a fundamental role in modern statistical methodology development and application. Recently, the DM distribution and its variants have been used extensively to model multivariate count data generated by high-throughput sequencing technology in omics research due to its ability to accommodate the compositional structure of the data as well as overdispersion. A major limitation of the DM distribution is that it is unable to handle excess zeros typically found in practice which may bias inference. To fill this gap, we propose a novel Bayesian zero-inflated DM model for multivariate compositional count data with excess zeros. We then extend our approach to regression settings and embed sparsity-inducing priors to perform variable selection for high-dimensional covariate spaces. Throughout, modeling decisions are made to boost scalability without sacrificing interpretability, imposing limiting assumptions, or relying on approximation techniques. We apply the model to a benchmark human microbiome data set and compare the performance of the proposed method to existing approaches.