Title: Sparse and robust PLS for regression and binary classification
Authors: Peter Filzmoser - Vienna University of Technology (Austria) [presenting]
Irene Hoffmann - Vienna University of Technology (Austria)
Sven Serneels - BASF Corporation (United States)
Christophe Croux - Leuven (Belgium)
Kurt Varmuza - Vienna University of Technology (Austria)
Abstract: Partial least squares (PLS) regression is successfully used to regress a univariate response on a potentially big number of explanatory variables. PLS can also be used in a high-dimensional two-group discrimination setting; in this case the response is a binary variable representing the two groups. The key idea is to reduce the dimensionality of the regressors by projection to latent structures. Since the dimension reduction and the regression step are sensitive to outlying observations or heavy-tailed distributions, a robust method called Partial Robust M-estimation (PRM) has been introduced which robustifies both steps. In high-dimensional regression or classification problems, variable selection is frequently desired, since it simplifies the interpretation of the resulting model and stabilizes the prediction model. For this reason, a sparse version of PLS has been introduced which leads to a regression coefficient vector that contains zeros. We propose a robust version, called sparse PRM, which turns out to be very useful in high-dimensional regression problems in presence of data artifacts. This method has been modified to work for binary classification problems as well. Inference based on bootstrap allows to identify the significant predictors.