CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: The BRIk and FABRIk algorithms for improving $k$-means clustering recovery Authors:  Aurora Torrente Orihuela - Universidad Carlos III de Madrid (Spain) [presenting]
Javier Albert Smet - Universidad Carlos III de Madrid (Spain)
Juan Romo - Universidad Carlos III de Madrid (Spain)
Abstract: The $k$-means algorithm is widely used in various research fields because of its fast convergence to the cost function minima; however, it frequently gets stuck in local optima as it is sensitive to initial conditions. The BRIk algorithm is a simple, computationally feasible and efficient method that provides $k$-means with a set of initial seeds to cluster datasets of arbitrary dimensions. In terms of clustering recovery, it drastically improves $k$-means results with respect to other widely-used initialization procedures. It relies on clustering a set of tighter (thus easier to separate) centroids derived from bootstrap replicates of the data and on the use of the versatile Modified Band Depth to identify the deepest point of each cluster. On the other hand, FABRIk is a recent functional-data extension of the BRIk algorithm for longitudinal data, where appropriate B-splines are fit to the observations and a resampling process is used to handle issues such as noise or missing data. When run with simulated and real data sets, FABRIk outperforms the alternative techniques, including BRIk and functional-data versions of its competitors.