COMPSTAT 2016: Start Registration
View Submission - COMPSTAT
Title: Estimating the number of clusters in OTRIMLE robust Gaussian mixture clustering Authors:  Christian Hennig - UCL (United Kingdom) [presenting]
Pietro Coretto - University of Salerno (Italy)
Abstract: The method of Optimally Tuned Robust Improper Maximum Likelihood (OTRIMLE) has been recently developed for clustering data based on a Gaussian mixture model, but allowing for some observations that could not reasonably be assigned to any cluster. Those are modelled by a constant pseudo-density. This constant is found by minimising the Kolmogorow distance between the distribution of Mahalanobis distances to the corresponding cluster mean of the portion of the data classified as non-outlying and a chi squared distribution. We explore model diagnostic and estimation of the number of clusters by parametric bootstrap: we generate many datasets from the Gaussian mixture (non-outlying) part of the estimated mixture, using the estimated parameters, and we compare the distribution of the values of the Mahalanobis criterion to the value achieved for the dataset under study. This allows us to see which numbers of clusters yield models that are consistent with the data.