CMStatistics 2018: Start Registration
View Submission - CMStatistics
Title: Model-based clustering in very high dimensions via adaptive projections Authors:  Bernd Taschler - German Center for Neurodegenerative Diseases (Germany) [presenting]
Frank Dondelinger - Lancaster University (United Kingdom)
Sach Mukherjee - German Center for Neurodegenerative Diseases (Germany)
Abstract: Model-based clustering is considered in high-dimensional settings where the dimension $p$ is large relative to sample size $n$ and where either or both of means and covariance structures may differ between the latent groups. We propose an approach called {\it Model-based Clustering via Adaptive Projections} or {\it MCAP}. Instead of estimating mixtures in the original space, we work in a low-dimensional space obtained by linear projection. The projection dimension plays an important role and governs a type of bias-variance trade-off. MCAP sets the projection dimension automatically in a data-adaptive manner. The mixture modelling itself is done using a full covariance formulation and this, combined with the adaptive projection, allows detection of both mean and covariance signals in very high dimensional problems. We show real-data examples in which covariance signals are reliably detected in problems with $p \sim 10^4$ or more, and examples where MCAP maintains performance even when the mean signal is entirely removed, leaving differential covariance structure in the high-dimensional space as the only signal. Across a number of regimes, MCAP performs as well or better than a range of existing methods, including a recently-proposed $\ell_1$-penalized approach, and performance remains broadly stable with increasing dimension, at low computational cost.