CMStatistics 2018: Start Registration
View Submission - CMStatistics
Title: A targeted multi-partitions clustering Authors:  Matthieu Marbac - CREST - ENSAI (France) [presenting]
Christophe Biernacki - Inria (France)
Mohammed Sedki - Paris-Sud University, Inserm, Pasteur, UVSQ (France)
Vincent Vandewalle - Inria (France)
Abstract: Clustering is generally not a purpose by itself, because its results are mainly tools used by the statistician for another analysis. Indeed, in many applications, clusters are assessed from a set of observed variables, then these clusters are used to predict other variables which are used or not in clustering. Because the final objective of prediction is not considered during cluster analysis, there is no reason to obtain relevant clusters for the variables to predict. We present a unified approach which simultaneously performs cluster analysis and prediction. This method considers that the variables to clusters arise from a product of finite mixture models which provides multiple partition. Moreover, the variables to predict are considered to be independent to the variables to cluster given the partition. The predictions are achieved by a generalized linear model. Model selection is conducted by optimizing the BIC. This optimization is achieved with a modified version of the EM algorithm which performs model selection and maximum likelihood inference simultaneously.