Title: Post-selection inference in correlation learning
Authors: Kory Johnson - University of Vienna (Austria) [presenting]
Abstract: Forward stepwise regression provides an approximation to the sparse feature selection problem and is used when the number of features is too large to manually search model space. In this setting, we desire a rule for stopping stepwise regression using hypothesis tests while controlling a notion of false rejections. That being said, forward stepwise regression is commonly considered to be ``data dredging" and not statistically sound. As the hypotheses tested by forward stepwise are determined by looking at the data, the resulting classical hypothesis tests are not valid. We present a simple solution which leverages classical multiple comparison methods in order to test the stepwise hypotheses using the max-$t$ test proposal. The resulting procedures are fast enough to be used in high-dimensional settings and can be tailored to control the family-wise error rate or FDR. Other procedures estimate new, computationally difficult $p$-values and have significant lower power. We provide both step-up and step-down variants of our procedure. Furthermore, our proofs readily extend to more general correlation learning methods such as sure independent screening.