CMStatistics 2021: Start Registration
View Submission - CMStatistics
B0673
Title: Selecting clustering algorithms and solutions via quadratic scoring Authors:  Luca Coraggio - University of Naples Federico II (Italy) [presenting]
Pietro Coretto - University of Salerno (Italy)
Abstract: A novel methodology is introduced to score clustering solutions and select the optimal one from a set of candidate solutions. In particular, we develop a framework where clustering solutions are represented via triplets of parameters of clusters' proportions, centres and scatters; this representation is used together with the quadratic score function (central to Quadratic Discriminant Analysis) to develop two novel cluster quality criteria, named quadratic scores. These assess the extent to which sample points are well accommodated into quadratic regions defined by the clustering. We show that the proposed criteria are consistent with clusters generated from a restricted class of mixtures of elliptical-symmetric distributions, including the Gaussian model. Nonetheless, the proposed criteria are method-independent: they do not rely on any particular clustering framework or algorithm and can be computed for any clustering solution. We also propose variations on the quadratic scores, which make use of cross-validation and bootstrap resampling. We compare our proposals with several established criteria from the literature, used to select clustering solutions; these include method-independent and model-based criteria. The proposed methodology proves to achieve among the highest performances in an extensive empirical study on both simulated and real data sets, involving 440 clustering solutions per data set.