CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Bayesian clustering of high-dimensional data via latent repulsive mixtures Authors:  Lorenzo Ghilotti - University of Milano-Bicocca (Italy) [presenting]
Mario Beraha - Università di Torino (Italy)
Alessandra Guglielmi - Politecnico di Milano (Italy)
Abstract: In modern applications, it is common to collect high-dimensional data and be interested in clustering subjects based on them. It has been shown that mixture models produce inconsistent inference in that setting, proposing a general class of models overcoming such an issue, called Lamb. Their approach consists in linking observations to a set of low-dimensional latent factors through a matrix of loadings, and performing model-based clustering via nonparametric mixture models on the latent space. Lamb models are likely to be misspecified, thus leading to inconsistent clustering. Repulsive mixture models have recently provided empirical evidence about robustness to misspecification, limitedly to low-dimensional data. We propose, within the class of Lamb models, to employ a repulsive mixture model to cluster the latent factors. To this end, we propose a general construction for anisotropic determinantal point processes (DPPs), which guarantees the analytical availability of their spectral densities. We employ such a DPP as prior in a repulsive mixture model on the latent factors, and let the matrix of factor loadings drive the anisotropic behavior, so that separation is indeed induced between the high-dimensional centers of different clusters. An efficient MCMC algorithm is proposed, and the methodology is compared to existing methods.