View Submission - CMStatistics

B1344
**Title: **The K-modes and Laplacian K-modes algorithms for clustering
**Authors: **Miguel Carreira-Perpinan - University of California, Merced (United States) **[presenting]**

**Abstract: **Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes. We propose a natural definition of a K-modes objective function by combining two powerful ideas in clustering: the explicit use of assignment variables (as in K-means), and the estimation of cluster centroids which are modes of each cluster's density estimate (as in mean-shift). The algorithm becomes K-means and K-medoids in the limit of very large and very small scales. Computationally, it is slightly slower than K-means but much faster than mean-shift or K-medoids. Unlike K-means, it is able to find centroids that are valid patterns, truly representative of a cluster, even with nonconvex clusters. Then, we extend this definition to the Laplacian K-modes objective function by regularizing K-modes with the graph Laplacian, which encourages similar assignments for nearby points (as in spectral clustering). The optimization alternates a convex assignment step and a mean-shift step. This finds meaningful estimates of the density for each cluster, even with challenging problems where the clusters have manifold structure, are highly nonconvex or in high dimension, as with images or text. It also provides an out-of-sample mapping that predicts soft assignments for a new point, in effect a nonparametric model for cluster posterior probabilities.