CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: Challenges and proposals for Dirichlet process mixture models (DPMM) with Gaussian kernels Authors:  Wei Jing - University of St Andrews (United Kingdom) [presenting]
Michail Papathomas - University of St Andrews (United Kingdom)
Silvia Liverani - Queen Mary University of London (United Kingdom)
Abstract: The Dirichlet process mixture model (DPMM) is considered in the context of clustering for continuous data when the conditional likelihood is set to be the multivariate normal distribution. Simulation studies show that the DPMM struggles to uncover true clusters when the data contain even just a handful of variables, even when the normality assumption is correct. An introduction of the DPMM is given first, followed by simulation examples highlighting the problem the DPMM currently faces. Potential reasons that lead to the problem are analyzed. Specifically, one of the reasons is the difference between the overall covariance matrix for the variables (calculated from pooling the data of all the clusters) and the within-cluster covariance matrices, which impedes the sampler from moving towards the target cluster allocation. Another possible factor is adopting an unsuitable prior distribution for the within-cluster covariance matrices. Different priors that can be placed on the within-cluster covariance matrices are reviewed, and their performance is assessed and compared. Finally, other aspects that can improve or influence the performance of the DPMM are discussed.