Title: Family of mixtures of multivariate Poisson log-normal distributions for clustering high dimensional count data
Authors: Andrea Payne - Carleton University (Canada) [presenting]
Anjali Silva - University of Guelph (Canada)
Steven Rothstein - University of Guelph (Canada)
Paul McNicholas - McMaster University (Canada)
Sanjeena Dang - Carleton University (Canada)
Abstract: Multivariate count data encountered in bioinformatics are high dimensional and often exhibit over-dispersion. Mixtures of multivariate Poisson lognormal (MPLN) models have been used to analyze these multivariate count measurements efficiently. In the MPLN model, the latent variable comes from a multivariate Gaussian distribution and the counts, which are conditional on this latent variable, are modeled using a Poisson distribution. The MPLN model can account for over-dispersion and allows for correlation between the variables. We extend the mixture of multivariate Poisson-log normal distributions for high dimensional data by incorporating a factor analyzer structure in the latent space. A parsimonious family of mixtures of Poisson log-normal distributions are proposed by decomposing the covariance matrix and imposing constraints on these decompositions. We demonstrate the performance of the model using simulated and real datasets.