Workshop SDS: Registration
View Submission - SDS2022
A0176
Title: Biplots in dimension reduction and clustering Authors:  Michel van de Velden - Erasmus University Rotterdam (Netherlands) [presenting]
Alfonso Iodice D Enza - University of Naples Federico II (Italy)
Angelos Markos - Democritus University Of Thrace (Greece)
Abstract: In unsupervised learning, dimension reduction (e.g., PCA) and distance-based clustering are often applied sequentially: the distances used to cluster the observations are computed on the reduced dimensions. Since the dimension reduction step does not take into account the possible cluster structure, it is possibly detrimental to the clustering step. Methods for joint dimension reduction and clustering combine the two in a single optimization problem which is solved using iterative procedures alternating the two steps. Just like for principal component methods, different approaches have been proposed that deal with continuous, categorical or mixed-type data. In particular, for continuous data, reduced K-means combines principal component analysis with K-means clustering; for categorical data, cluster correspondence analysis combines correspondence analysis with K-means; for mixed-type data, mixed Reduced K-means combines factor analysis for mixed data with K-means. The biplot visualization of the solution is of particular interest for interpretation purposes: the low-dimensional map can be very helpful for cluster characterization. We illustrate the use of biplots in the context of dimension reduction and clustering.