CMStatistics 2020: Start Registration
View Submission - CFE
Title: Efficient methods for high-dimensional robust variable selection Authors:  Wojciech Rejchel - University of Warsaw (Poland) [presenting]
Malgorzata Bogdan - University of Wroclaw (Poland)
Konrad Furmanczyk - Warsaw University of Life Sciences (Poland)
Abstract: Variable selection is a fundamental challenge, if one works with large-scale data sets, that the number of predictors significantly exceeds the number of observations. In many practical problems (from genetics or biology) finding a small set of significant predictors is as important as accurate estimation or prediction. We investigate the variable selection problem in the single index model $Y=g(\beta 'X,\varepsilon)$, where $Y$ is a response variable, $X$ is a vector of predictors, $\beta$ is the true parameter, and $\varepsilon$ is a random error. We make no assumptions on the distribution of errors, the existence of their moments etc. Moreover, $g$ is an unknown function. We propose a computationally fast variable selection procedure, which is based on standard Lasso with response variables replaced by their ranks. If response variables are binary, our approach is even simpler: we treat their class labels as they were numbers and apply standard Lasso. Since our approaches lead to misspecified models, we start with establishing the relation between the true parameter $\beta$ and parameters, which we estimate. Then we present theoretical and numerical results describing variable selection properties of the methods.