COMPSTAT 2018: Start Registration
View Submission - COMPSTAT2018
Title: A dynamic approach for clustering of variables in high-dimensional analysis Authors:  Christian Derquenne - EDF Research and Development (France) [presenting]
Abstract: The research of structures in the data represents an essential aid to understanding the phenomena to be analyzed. We have previously offered a set of methods for clustering numeric variables with linear or non-linear links. In case of high-dimensional data (a lot of variables and a lot of individuals), we propose to adapt these methods by means of different strategies to divide to conquer. Firstly, a random sample of individuals is collected and it is cut in random groups of variables. The sample of individuals and the groups of variables have reasonable sizes. On each group, a clustering method is applied. Each cluster is represented by the first principal component. Then, the same clustering method is applied again on all first principal components, and a final typology is obtained grouping initial variables. However, this process has only been applied on one random sample of individuals. So we evaluate the quality of the results, and we extend them with different strategies depending on such quality. Finally, a multiple correspondence analysis is applied to obtain a last typology with all data. This approach has been applied to simulated data and real data.