COMPSTAT 2018: Start Registration
View Submission - COMPSTAT2018
A0297
Title: Penalized k-Means Authors:  Patrick Groenen - Erasmus University Rotterdam (Netherlands) [presenting]
Mariko Takagishi - Doshisha University (Japan)
Yoshikazu Terada - Osaka University; RIKEN (Japan)
Abstract: A well-known problem in k-means clustering is the choice of the number of clusters. So far, several heuristic strategies have been proposed in the literature. We propose a penalty approach on the cluster memberships to determine the number of clusters automatically. The penalty makes use of the grouped lasso of the cluster memberships per cluster. In addition, double generalized Pareto (GDP) shrinkage is applied that favors large clusters above small ones. Then, the hyper parameter governing the penalty strength is determined through cross validation. Although penalized k-means looses the property of a crisp clustering solution, GDP shrinkage seems to favour cluster memberships either close to zero or to one. We propose the algorithm for penalized k-means that makes use of majorization and quadratic programming. Its performance is studied through a simulation study.