Title: Semi-supervised multi-view Bayesian nonparametric clustering for integrative genomics
Authors: Paul Kirk - University of Cambridge (United Kingdom) [presenting]
Abstract: Although the challenges presented by high dimensional data in the context of regression are well-known and the subject of much current research, comparatively little work has been done on this in the context of clustering. In this setting, the key challenge is that often only a small subset of the covariates provides a relevant stratification of the population. Identifying relevant strata can be particularly challenging when dealing with high-dimensional datasets, in which there may be many covariates that provide no information whatsoever about population structure, or - perhaps worse - in which there may be (potentially large) covariate subsets that define irrelevant stratifications. For example, when dealing with genetic data, there may be some genetic variants that allow us to group patients in terms of disease risk, but others that would provide completely irrelevant stratifications (e.g. which would group patients together on the basis of eye or hair colour). Bayesian profile regression is a semi-supervised model-based clustering approach that makes use of a response in order to guide the clustering toward relevant stratifications. Here we consider how this approach can be extended to the ``multiview'' setting, in which different groups of covariates (``views'') define different stratifications. We present some results in the context of cancer subtyping to illustrate how the approach can be used to perform integrative clustering of multiple 'omics datasets.