Title: Challenges in assessing lack of fit for non-parametric quantile models
Authors: Ting Zhang - McGill University (Canada) [presenting]
Gael Varoquaux - INRIA (France)
Jean-Baptiste Poline - McGill University (Canada)
Celia Greenwood - McGill University (Canada)
Abstract: Assessing model fit in non-parametric quantile regressions with multiple predictors is challenging. A new paradigm has been proposed for testing lack-of-fit, by testing the equality of the two covariate distributions defined by separating the data at the fitted quantile. However, their test has limitations. (1) It detects underfit (e.g. a missing covariate) but not data overfit. (2) It uses data twice for model fitting and lack-of-fit testing, thereby leading to invalid type 1 errors. We propose to improve this testing procedure by: (1) splitting data into training and testing sets, (2) replacing the core test statistic for testing distributional equivalence by an $L_1$ kernel mean embedding, and (3) modifying the estimation of significance by changing the wild bootstrap method. We will first illustrate the problems through extensive simulation studies, and compare the proposed modifications (2) and (3) to the original lack-of-fit test after data splitting. Performance is assessed by type 1 error control, power and computational speed. Our replacement test statistic has better discrimination, and a known distribution making computations much faster than the previous method. However, since the true data generating model is always unknown, the lack of fit tests which use models that fit observed data is intrinsically problematic.