Title: Robust clustering of multivariate skew data
Authors: Luis Angel Garcia-Escudero - Universidad de Valladolid (Spain)
Agustin Mayo-Iscar - Universidad de Valladolid (Spain)
Geoffrey McLachlan - University of Queensland (Australia)
Francesca Greselin - University of Milano Bicocca (Italy) [presenting]
Abstract: With the increasing availability of multivariate datasets, attention is being directed to providing more robust methods than classical approaches for model based clustering like mixtures of Gaussian distributions. Moreover, in performing ML estimation we know that a few outliers in the data can affect the estimation, hence providing unreliable inference. Challenged by such issues, more flexible and solid tools for modeling heterogeneous skew data are needed. We introduce a robust approach for estimating mixtures of canonical fundamental skew normal, based on trimming outlying observations and performing constrained estimation. We also provide a feasible EM to implement model estimation. Before each E-step, we add a trimming step, in which the less plausible observations, if the estimated model was true, are tentatively trimmed. Moreover, along the M-step, constraints on the scatter matrices are imposed, to avoid singularities and reduce the occurrence of spurious maximizers. The advantages of the new approach are shown through applications on different fields, also in comparison to recent contributions in the literature, like mixtures of skew distributions with heavier than normal tails.