Title: On the likelihood ratio test in high-dimensional logistic regressions
Authors: Pragya Sur - Indian Statistical Institute (India)
Yuxin Chen - Princeton University (United States) [presenting]
Emmanuel Candes - Stanford (United States)
Abstract: Logistic regression is used thousands of times a day to fit data and assess the statistical significance of explanatory variables. When used for statistical inference, logistic models produce $p$-values for the regression coefficients by using a large-sample approximation to the distribution of the likelihood-ratio test (LRT). However, this asymptotic approximation is grossly incorrect when the number $p$ of explanatory variables is comparable to the sample size $n$; in fact, this approximation produces $p$-values that are far too small (under the null hypothesis). We show that in the high-dimensional regime, the LRT converges to a rescaled chi-square, where the rescaling factor can be determined by solving a nonlinear system of two equations with two unknowns. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. The results also extend to some other regression models such as the probit regression model.