Title: Model ensemble in density-based clustering
Authors: Alessandro Casa - Free University of Bozen-Bolzano (Italy) [presenting]
Luca Scrucca - Universita' degli Studi di Perugia (Italy)
Giovanna Menardi - University of Padova (Italy)
Abstract: Model-based clustering represents a widely known approach when searching for groups in the data. Finite mixture models are adopted to describe the data generative mechanism, and partitions are obtained by drawing a correspondence between components and groups. Operationally several different models, with different parameterizations and number of components, are estimated and the best one among them is chosen by means of an information criterion. Nonetheless, considering a single model to cluster the data, this strategy may be sub-optimal since throwing away all the fitted models except for the best one could lead to a potentially harmful loss of information. In order to overcome this issue, an ensemble clustering approach is proposed, circumventing the single best model paradigm, thus potentially improving the stability and the robustness of the partitions. A new density estimator, defined as a convex linear combination of the density estimates in the ensemble, is introduced and exploited for group assignment. Finally, since the correspondence between mixture components and clusters is lost in the process, we define partitions by borrowing the modal, or nonparametric, formulation of the clustering problem, where groups are associated with high-density regions of the density.