Title: Tuning model-based gradient boosting algorithms with focus on variable selection
Authors: Tobias Hepp - Friedrich-Alexander-Universitaet Erlangen-Nuernberg (Germany) [presenting]
Janek Thomas - Ludwig-Maximilians-University Munich (Germany)
Andreas Mayr - University of Bonn (Germany)
Bernd Bischl - LMU Munich (Germany)
Abstract: Variable selection in regularized regression models like the lasso or gradient boosting algorithms is usually controlled by method-specific tuning-parameters that define the degree of penalization. While these parameters are commonly determined using resampling strategies like cross-validation, bootstrapping and similar methods, their focus on minimizing the prediction error often results in the selection of many variables without true effect on the outcome. Therefore, we propose a new method to determine the optimal number of iterations in model-based boosting for variable selection inspired by probing, a method used in related areas of machine learning research. The general notion of probing involves the artificial inflation of the data with random noise variables, so-called probes or shadow variables. Using the first selection of a shadow variable as stopping criterion, the algorithm is applied only once without the need to optimize any hyperparameters in order to extract a set of informative variables from the data, thereby making its application very fast and simple in practice. Furthermore, simulation studies show that the resulting models tend to be more strictly regularized compared to the ones resulting from cross-validation, thereby substantially reducing the high number of false discoveries.