CMStatistics 2018: Start Registration
View Submission - CMStatistics
Title: A graph approach to find the best grouping for each possible number of clusters Authors:  Cristian Gatu - Alexandru Ioan Cuza University of Iasi (Romania) [presenting]
Cornel Barna - University of York (Canada)
Ana Colubi - Justus Liebig University Giessen (Germany)
Erricos John Kontoghiorghes - Cyprus University of Technology and Birkbeck University of London, UK (Cyprus)
Abstract: The unsupervised non-hierarchical clustering of a set of $n$ data points is a NP-problem. A theoretical approach to solve this problem is proposed. Specifically, a graph structure which can be employed to enumerate and evaluate all possibilities to cluster a number of observations is introduced. Any complete traversal of the graph generates all possible clustering solutions. The structure of the graph is exploited in order to design an efficient branch-and-bound algorithm that finds the optimal clustering solution without traversing the whole graph. A heuristic version of the branch-and-bound algorithm that reduces the execution time at the expense of the solution quality is also presented. In addition, a $p$-combination method that considers at each level of the graph not all, but only the best $p$ groupings is also investigated. The Ward method is a special case for $p=1$. This allows for trading between exploration and computational efficiency. Experimental results are presented and analyzed. The new theoretical solution gives an insight to the clustering problem that could be the foundation for further developments on clustering related problems.