Title: Comparing EM to a greedy search algorithm to optimize ICL for mixture models
Authors: Arthur White - Trinity College Dublin (Ireland) [presenting]
Jason Wyse - Trinity College Dublin (Ireland)
Gilles Celeux - INRIA (France)
Abstract: The integrated complete-data likelihood (ICL) is a popular criterion in model-based clustering for choosing the number of clusters of a finite mixture model. Typically, the ICL is computed using a BIC-like approximation, which depends on maximum likelihood estimates that are found using the expectation-maximisation (EM) algorithm. Recently, an alternative method for clustering with the ICL has been introduced, that calculates the exact ICL in closed form within a Bayesian framework. A greedy search (GS) algorithm is then used to allocate observations to clusters in order to maximise the ICL directly and hence obtain an optimal clustering solution. This approach has the added benefit of simultaneously searching the model space. To better understand the properties of the GS method, we conducted an extensive simulation study comparing its performance to the standard EM approach, in terms of number of clusters selected, cluster accuracy, and computational cost. The performance of the methods on real data is also discussed.