CMStatistics 2017: Start Registration
View Submission - CMStatistics
Title: A robust clustering procedure with unknown number of clusters Authors:  Francesco Dotto - Sapienza - University of Rome (Italy) [presenting]
Alessio Farcomeni - Sapienza - University of Rome (Italy)
Abstract: A new methodology for robust clustering without specifying in advance the underlying number of Gaussian clusters is proposed. The procedure is based on iteratively trimming, assessing the goodness of fit, and reweighting. The forward version of our procedure proceeds by fixing a high trimming level and $K=1$ population. The procedure is then iterated throughout a fixed sequence ofdecreasing trimming levels. New observations are added at each step and, whenever necessary, the number of components $K$ is increased. Goodness of fit is assessed against the empirical distribution of theMahalanobis distances of the untrimmed observations from the closest centroid, with parameters estimated at the previousiteration. A stopping rule prevents our procedure for using outlying observations; while a backward criterion is adopted whenever too many clusters are detected. A simulation study shows that our method compares well with robust procedures with known number of clusters, and is robust in the presence of different contamination schemes.