EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0899
Title: Sure early selection by searching for the best subset Authors:  Shihao Wu - University of Michigan, Ann Arbor (United States) [presenting]
Ziwei Zhu - University of Michigan, Ann Arbor (China)
Abstract: In scientific discovery, it is often statistically intangible to identify all the important features with no false discovery, let alone the intimidating expense of experiments to test their significance. Such realistic limitation calls for a statistical guarantee for the early discovery of a model selector to navigate scientific adventure on the sea of big data. We focus on the early solution path of best subset selection (BSS), where the sparsity constraint is set to be lower than the true sparsity. Under a sparse high-dimensional linear model, we establish the sufficient and (near) necessary condition for BSS to achieve sure early selection, or equivalently, zero false discovery throughout its entire early path. Essentially, this condition boils down to a lower bound of the minimum projected signal margin that characterizes the fundamental gap in signal capturing between sure selection models and those with spurious discovery. Defined through projection operators, this margin is independent of the restricted eigenvalues of the design, suggesting the robustness of BSS against collinearity. On the computational aspect, we introduce a screen-then-select (STS) strategy to search for the best subset. Theoretical guarantee for sure early selection using the STS strategy is established. Numerical experiments show that the early solution paths of STS exhibit a much lower false discovery rate than competing approaches.