CMStatistics 2018: Start Registration
View Submission - CMStatistics
B1625
Title: Ignoring the differences in model properties of sparse PCA and standard PCA can be dangerous and misguide practice Authors:  Soogeun Park - Tilburg University (Netherlands) [presenting]
Katrijn Van Deun - Tilburg University (Netherlands)
Eva Ceulemans - University of Leuven (Belgium)
Abstract: Principal component analysis (PCA) is a widely used data reduction technique which finds a weights matrix that orthogonally transforms variables into components with maximal variance. PCA has the special property that this weights matrix is equivalent to a loadings matrix which represents variable-component correlation. Furthermore, PCA is intrinsically linked to the eigenvalue decomposition with loadings and weights being equal to the eigenvectors of the correlation matrix. Sparse PCA, devised to improve interpretability of PCA, introduces sparsity to either the weights or loadings matrix, at the cost of this property: weights and loadings are no longer equivalent in sparse PCA. However, most researchers appear to have maintained an inattentive conception that sparse PCA has equivalent modeling characteristics as PCA. This had led to misguided practices in research such as generating data from simplistic PCA models comprised of sparse eigenvectors for simulation studies and naive use of PCA-based initial values. These mistakes are brought to light and suggestions are made to fix them. The aim is to contribute to shifting the research towards the necessary attention on the statistical models of sparse PCA.