CMStatistics 2018: Start Registration
View Submission - CMStatistics
B1149
Title: Extending robust fuzzy clustering to skew data Authors:  Francesca Greselin - University of Milano Bicocca (Italy) [presenting]
Luis Angel Garcia-Escudero - Universidad de Valladolid (Spain)
Agustin Mayo-Iscar - Universidad de Valladolid (Spain)
Abstract: Clustering is an important technique in exploratory data analysis, with applications in image processing, object classification, target recognition, data mining etc. The aim is to partition data according to natural classes present in it, assigning data points that are more similar to the same cluster. We solved this ill-posed problem by adopting a fuzzy clustering method, based on mixtures of skew Gaussian, endowed by the joint usage of trimming and constrained estimation of scatter matrices. A set of membership values are used to fuzzy partition the data and to contribute to the robust estimates of the mixture parameters. The purpose is to adopt the basic skew Gaussian component for the mixture and apply impartial trimming to the data, to model the skew core of the clusters and to adapt to any type of tail behaviour. The choice of the skew Gaussian components is motivated by the fact that, with the increased availability of multivariate datasets, often underlying asymmetric structures appear. In these cases, the extremely useful paradigm for clustering given by the mixtures of Gaussian distributions appeared somehow unrealistic. Moreover, impartial trimming provides robust ML estimation, even in presence of outliers in the data. Finally, synthetic and real data are analyzed, to show how intermediate membership values are estimated for observations lying at cluster overlap, while cluster cores are composed by observations that are assigned to a cluster in a crisp way.