CMStatistics 2017: Start Registration
View Submission - CMStatistics
B1374
Title: A prediction-based algorithm for variable selection with applications in genomics Authors:  Roberto Molinari - University of Geneva (Switzerland) [presenting]
Stephane Guerrier - Pennsylvania State University (United States)
Yanyuan Ma - Pennsylvania State University (United States)
Marco Avella Medina - MIT (United States)
Samuel Orso - University of Geneva (Switzerland)
Mili Nabil - University of Geneva (Switzerland)
Abstract: The task of model selection is often associated with the minimization of a given loss function which, in the vast majority of cases, is linked to the objective function used to estimate the model parameters (e.g. the likelihood function). However, in many applied cases, these loss functions are not necessarily what practitioners are interested in minimizing. For this purpose, we propose a new algorithm which, among others, makes use of cross-validation to deliver tailor-made variable selection criteria which respond to the needs of practitioners. This approach is flexible in terms of the modelling framework of reference and in terms of criteria of interest as well as being able to deliver a set of sparse models with extremely high predictive power. The latter is particularly useful in the field of genomics where, among millions of gene transcripts, it is not only important to select a few genes which, for example, can predict the presence of certain diseases but can also build gene networks that can have important biological interpretations. Some applied examples show how the proposed method not only performs better than existing approaches in terms of prediction accuracy but also delivers a flexible framework that opens new avenues for model estimation and variable selection in large data settings.