Title: Using correlated resampling to improve variable selection for linear and generalized linear models
Authors: Myriam Maumy - IRMA/Universite of Technology of Troyes (France) [presenting]
Frederic Bertrand - IRMA/Universite de technologie de Troyes (France)
Abstract: Technological innovations make it possible to measure large amounts of data in a single observation. Hence, problems in which the number of variables is larger than the number of observations have become common. As reviewed almost twenty years ago, such situations arise in many fields from fundamental sciences to social science, and variable selection is required to tackle these issues. Moreover, in such studies, the correlation between variables is often very strong, and variable selection methods often fail to make the distinction between the informative variables and those which are not. As a consequence, variable selection has become one of the critical challenges in statistics and many methods have already been proposed in the literature. If the number of variables far exceeds the number of observations or if the variables are highly correlated, performances of variable selection methods are generally limited in recall and precision. We propose a general algorithm that enhances model selection in correlated variables dataset. We use the correlation structure to select reliable variables in parsimonious or non-parsimonious linear regression or generalized linear regression problems. Thanks to correlated resampling techniques, it is possible to improve the performance of many common existing models -glmnet, lasso, spls, - as demonstrated on both simulated and real datasets using a comprehensive simulation benchmark.