CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1068
Title: Subspace independent component analysis: Finding clustering structures in a low dimensional space Authors:  Jeffrey Durieux - Erasmus University (Netherlands) [presenting]
Abstract: K-means is an often-used clustering method. However, applying K-means to a data set may fail to uncover clusters due to presence of masking variables and the curse of dimensionality. A commonly used workaround is to apply PCA to the data prior to performing cluster analysis, a practice called Tandem Analysis (TA). A vulnerability of TA is that PCA does not guarantee to preserve the cluster structure present in the original data, jeopardizing the usefulness of subsequent cluster analysis. Multiple authors have provided procedures that reduce the dimensionality of a data set and perform cluster analysis on the reduced data, all aiming to find suitable low-dimensional representations of data while also keeping cluster structures intact. We present a novel approach to reducing dimensionality and performing cluster analysis on the low dimensional representation of the data called Subspace Independent Component Analysis (SICA). The method is described and thoroughly tested in systematically manipulated simulation studies where we compare it to related methods. Results show that SICA is a fast procedure that extracts components from the data that preserve cluster structures, but that performance depends on characteristics of the data. In addition, the correctness of the clusterings obtained through SICA is high, although it does not always outperform currently available methods.