Title: Hypothesis testing when fitting simple models to high-dimensional data
Authors: Lukas Steinberger - University of Vienna (Austria) [presenting]
Hannes Leeb - University of Vienna (Austria)
Abstract: In a linear regression problem with a huge number of potentially important explanatory variables, it is often desirable to select only a few regressors, even if it is not a priori evident that most regressors are truly irrelevant or at least practically negligible. If such a simple working model is maintained, even though the true data generating mechanism is much more complex, the subset regression problem may not be linear and homoskedastic, due to the omission of important variables, even if the full model is. Therefore, classical linear regression techniques may be inappropriate for the working model. In contrast to these well known issues, we show that if the number of available explanatory variables in the full model is very large, then the classical f-test, based on the observations of only a few regressors of interest, is approximately valid for testing the significance of these regressors for prediction. Therefore, we conclude that although a small set of regressors with high predictive power may not even exist, or it may be computationally prohibitive to find, it is still possible to decide about the predictive power of any given choice of regressors.