Title: Pre-processing with orthogonal decompositions for high-dimensional explanatory variables
Authors: Cheng Yong Tang - Temple University (United States) [presenting]
Abstract: It is well known that strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the irrepresentable condition, the popular lasso method may suffer from false inclusions of non-contributing variables. We propose preprocessing orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The PROD procedure is constructed based upon a generic orthogonal decomposition of the design matrix. We investigate in detail three specific cases of the PROD: one by the conventional principal component analysis, one by a novel optimization incorporating the impact from the response variable, and one by random projections. We recognize that the PROD can be flexibly adapted taking multiple objectives into consideration such as avoiding increasing the variance of the resulting estimator while alleviating strong correlations between the explanatory variables. Extensive numerical studies with simulations and data analysis show the promising performance of the PROD in improving the performance of high-dimensional penalized regression. Our theoretical analysis also confirms its effect and benefit for high-dimensional regularized regression methods.