CMStatistics 2018: Start Registration
View Submission - CMStatistics
B0973
Title: Cross-validation improved by aggregation: Agghoo Authors:  Guillaume Maillard - Université Paris Sud (France) [presenting]
Matthieu Lerasle - CNRS (France)
Sylvain Arlot - Universite Paris-Sud and INRIA (France)
Abstract: Cross-validation is widely used for selecting among a family of learning rules. A related method, called aggregated hold out (Agghoo), is studied which mixes cross-validation with aggregation; Agghoo can also be related to bagging. We provide the first theoretical guarantees on Agghoo, ensuring that one can use it safely: at worse, Agghoo performs like the hold out, up to a constant factor. For the hold out, oracle inequalities were known in the case of bounded losses, as in binary classification. The approach can be extended to most classical risk minimization problems, including regression with least squares loss or others: it works particularly well with Lipschitz losses such as the Huber loss or quantile regression. In all these settings, Agghoo verifies an oracle inequality. However, simulation studies suggest that real performance is often much better than what theory can currently prove. In particular, there is a large gain from aggregation that current bounds derived from the hold out are incapable of capturing. As a result, Agghoo appears to be competitive with standard cross-validation in practice.