Title: Tools for clustering based on k-barycenters in the Wasserstein space
Authors: Hristo Valdes - University of Valladolid (Spain) [presenting]
Carlos Bea - University of Valladolid (Spain)
Eustasio del Barrio - Universidad de Valladolid (Spain)
Abstract: Cluster analysis addresses the detection of data grouping in data sets. Within this, too vague, description, model-based clustering aims to find particularly shaped groupings -clusters- according to specified distributions. In this setting, the clusters provided by the method are described by probability (often Gaussian) distributions, that can be considered as elements of an abstract space. Particular interest has been deserved by the L2 Wasserstein distance, leading to a rich set-up for developing statistical concepts in a parallel way to those known on Euclidean spaces. This is the case of the k-barycenters, the abstract version of k-means, by large the widest used method in clustering problems, recently introduced in the Wasserstein space even in a robust version. We focus on the application of the (trimmed) Wasserstein k-barycenters to some of the fundamental problems present in cluster analysis. This includes parallelization or stabilization of procedures and even improvement of initial solutions for the algorithms involved in the methods, but we will also pay special attention to the meta-analysis tools arising from this robust aggregation procedure: Stability (or coherence) criteria, applied to the provided aggregation, will give descriptive signs on the number of clusters or on the adequacy of the clustering procedure. We present illustrative examples of the previously mentioned concepts.