Title: Improving Lasso for sparse high dimensional GLM and Cox model selection
Authors: Piotr Pokarowski - University of Warsaw (Poland) [presenting]
Agnieszka Prochenka - University of Warsaw (Poland)
Michal Frej - University of Warsaw (Poland)
Jan Mielniczuk - Institute of Computer Science Polish Academy of Sciences (Poland)
Abstract: The Lasso, that is $l_1$-penalized loss estimator is a very popular tool for fitting sparse high dimensional models. However, theory and simulations established that the model selected by the Lasso is usually too large. The concave regularizations (SCAD, MCP or capped-$l_1$) are closer to $l_0$-penalized loss, that is, to the Generalized Information Criterion (GIC) than the Lasso, and they correct its intrinsic estimation bias. That methods use the Lasso as a starting set of models and try to improve it using local optimization. We propose an alternative method of improving the Lasso for Generalized Linear Models and Cox Models which is a generalization of our SOS algorithm for linear models. The method, for a given penalty, orders the absolute values of the Lasso non-zero coordinates and then selects the model from a small family by GIC. We derive upper bounds on the selection error of the algorithm and show in numerical experiments on synthetic and real-world data sets that an implementation of our algorithm is more accurate than implementations of concave regularizations.