CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1302
Title: Invariant coordinate selection as preprocessing for clustering Authors:  Aurore Archimbaud - Erasmus University Rotterdam (Netherlands) [presenting]
Andreas Alfons - Erasmus University Rotterdam (Netherlands)
Klaus Nordhausen - University of Helsinki (Finland)
Anne Ruiz-Gazen - Toulouse School of Economics (France)
Abstract: Dimension reduction is an important preprocessing step in the multivariate analysis field, likely improving the identification of clusters. The well-known Principal Component Analysis (PCA), is one of the most famous dimension reduction techniques, but it may not be the best choice for clustering purposes. An alternative approach, Invariant Component Selection (ICS), relies on the simultaneous diagonalization of two scatter matrices. It goes beyond PCA by finding directions of interest through the optimization of general kurtosis measures and returns affine invariant components. Two challenging steps are the choice of the pair of scatter matrices and the selection of the components to retain. Some theoretical results have already been derived that guarantee that under some elliptical mixture models, the structure of the data can be highlighted on a subset of the first and/or last components. ICS has received little attention concerning clustering tasks. We evaluate the performance of several well-known clustering algorithms with ICS as a preprocessing step. We consider different combinations of scatter matrices, components selection approaches and the impact of outliers, on some simulations and some benchmark data sets.