B1523
Title: Bayesian correlated clustering of multiple high-dimensional datasets using mixtures of factor analysers
Authors: Johan van der Molen Moris - University of Cambridge (United Kingdom) [presenting]
Abstract: Factor analysis is commonly used to perform dimensionality reduction to pre-process data for clustering. More recently, Bayesian mixtures of factor analysers have been proposed to perform clustering and dimensionality reduction jointly in a more principled manner. We extend this model to the Bayesian integrative clustering setting where multiple high-dimensional datasets are clustered simultaneously, by using multiple dependent mixtures of factor analysers. This is particularly relevant in molecular precision medicine, where we would like to identify disease subtypes on the basis of multiple omics data layers. Each observation unit (e.g. patients) has data coming from multiple high-dimensional data sources, such as gene or protein expression. These data sources provide complementary information, and thus it is critical to integrate them into the analysis. However, Bayesian model-based clustering using high-dimensional data presents challenges even in the single-dataset case, due to difficulties exploring multiple posterior modes, and these challenges are exacerbated when we are dealing with multiple high-dimensional datasets. We present a Bayesian correlated clustering mixture of factor analysers model for addressing these points, and further elucidate the challenges of performing MCMC-based parameter inference in such models.