CMStatistics 2017: Start Registration
View Submission - CMStatistics
Title: Cross-validation for estimator selection Authors:  Sylvain Arlot - Universite Paris-Sud and INRIA (France) [presenting]
Alain Celisse - Lille University (France)
Matthieu Lerasle - CNRS (France)
Abstract: Cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. It can be used with two main goals: (i) estimating the risk of an estimator, and (ii) model selection or hyperparameter tuning, or more generally for choosing among a family of estimators. Many results exist on the performance of cross-validation procedures, which can strongly depend on the goal for which it is used. The big picture of these results will be shown, with an emphasis on the goal of estimator selection. In short, at first order (when the sample size goes to infinity), the key parameter is the bias of cross-validation, which only depends on the size of the training set. Nevertheless, second-order terms do matter in practice, and we will show recent results on the role of the ``variance'' of cross-validation procedures on their performance. As a conclusion, we will provide some guidelines for choosing the best cross-validation procedure according to the particular features of the problem at hand.