Title: On robustness issues in modal clustering
Authors: Giovanna Menardi - University of Padova (Italy) [presenting]
Marco Rudelli - University of Padova (Italy)
Luca Greco - University G. Fortunato of Benevento (Italy)
Abstract: Deviations from model assumptions, along with the presence of a certain amount of outlying observations, are common in many practical statistical applications. Clustering techniques make no exception, yet some caution is required in this context. First, small clusters could be mistaken for outlying observations, or vice versa. Second, the concept of outlier itself shall be defined with respect to a cluster, rather than the entire data set, and depends on the considered notion of cluster. While robust methods have been proposed in both distance- and model-based clustering, the issue has been largely neglected in the modal framework. Clusters are associated with the domains of attraction of the modes of the density underlying data. Nonparametric methods, usually employed for density (and hence modes) estimation, are known to be vulnerable to the presence of outliers, and prone to the sparsity of data in high dimensions, as much of the probability mass is led to flow to the tails of the density, possibly giving rise to the birth of spurious modes. Robustness issues are discussed in this framework, and suitable measures to flag outliers are explored, especially with a view to trimming methods for modal clustering.