Title: Combinatorial strategies for greedy regression selection
Authors: Georgiana-Elena Pascaru - Alexandru Ioan Cuza University of Iasi (Romania)
Cristian Gatu - Alexandru Ioan Cuza University of Iasi (Romania) [presenting]
Erricos John Kontoghiorghes - Cyprus University of Technology and Birkbeck University of London, UK (Cyprus)
Abstract: Greedy kind algorithms are an established approach to the regression model selection problem. A main drawback of these methods is the reduced number of submodels that are evaluated in order to select a solution, thus failing to find the optimum. Two strategies that aim to overcome this issue are investigated: selection--k (SEL--k) and tree selection--k (TREE--k). SEL--k builds on standard forward selection, but selects at each step the best $k$ variables, instead of the only best one, thus allowing a degree of correlation between the variables included in the solution model. TREE--k is a method that explores a combinatorial search space, thus increasing the number of submodels that are investigated. Specifically, at each step of the algorithm a new search brunch is considered for each of the best $k$ most significant variable, say $x_1, \cdots, x_k$. On each of the $i$--th branch ($i = 1, \cdots, k$), the selection will always include $x_i$ in the submodel. A branch will terminate either when there are no more significant variables to choose from, or when all variables have been considered. Various experiments are conducted on both real and artificially generated datasets in order to assess the two proposed algorithms. The results are presented and discussed.