CMStatistics 2021: Start Registration
View Submission - CMStatistics
Title: Best subset selection is robust against design dependence Authors:  Ziwei Zhu - University of Michigan, Ann Arbor (China) [presenting]
Jianqing Fan - Princeton University (United States)
Yongyi Guo - Princeton University (United States)
Abstract: Best subset selection (BSS) is widely known as the holy grail for high-dimensional variable selection. We investigate the variable selection properties of BSS when its target sparsity is greater than or equal to the true sparsity. The main message is that BSS is robust against design dependence in terms of achieving model consistency and sure screening, and more importantly, that such robustness can be propagated to the near best subsets that are computationally tangible. Specifically, we introduce an identifiability margin condition that is free of restricted eigenvalues and show that it is sufficient and nearly necessary for BSS to exactly recover the true model. A relaxed version of this condition is also sufficient for BSS to achieve the sure screening property. Moreover, we show that a two-stage fully corrective iterative hard thresholding (IHT) algorithm can provably find a near best subset within logarithmic steps; another round of exact BSS within this set can recover the true model. The simulation studies and real data examples show that IHT yields lower false discovery rates and higher true positive rates than the competing approaches including LASSO, SCAD and Sure Independence Screening (SIS), especially under highly correlated design.