CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: A parallelizable model-based approach for marginal and multivariate clustering Authors:  Miguel de Carvalho - FCiencias.ID - Associacao para a Investigacao e Desenvolvimento de Ciencias (Portugal)
Gabriel Martos - Fundacion Universidad Torcuato Di Tella (Argentina)
Andrej Svetlosak - University of Edinburgh (United Kingdom) [presenting]
Abstract: This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins---but leaves the joint unspecified---it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.