Title: Determining the number of overlapping clusters: Simulation results for the additive profile clustering model
Authors: Tom Frans Wilderjans - Leiden University (Netherlands) [presenting]
Julian Rossbroich - Leiden University (Netherlands)
Abstract: In various fields of science, researchers are interested in revealing the underlying structural mechanisms that generated object by variable data (e.g., patient by symptom or consumer by brand data). Based on theoretical or empirical arguments, it may be hypothesized that these underlying mechanisms are captured by a clustering of the objects. To this end, researchers often adopt a partitioning method (e.g. $k$-means), which yields a set of non-overlapping clusters. However, in some cases it may be expected that objects are grouped into clusters that are allowed to overlap (i.e. an object belonging to multiple clusters). For the patient by symptom data it may, for example, be that the clusters correspond with syndromes and that patients may suffer from multiple syndromes at the same time (i.e. co-morbidity). To identify the overlapping object clusters, Mirkins additive profile (overlapping) clustering model may be used. A non-trivial task consists of determining the optimal number of overlapping clusters underlying a data set at hand. Up to now, however, this issue of model selection has not been studied in a systematic way. Therefore, we compare in an extensive simulation study various existing (e.g. CHull, cross-validation) and new model selection methods for the additive profile clustering model, with some of the new methods being methods for the partitioning case tailored to the context of overlapping clustering (e.g. AIC, CH-index).