Title: Power analysis for knockoff-calibrated high dimensional logistic regression
Authors: Jing Zhou - KU Leuven (Belgium) [presenting]
Gerda Claeskens - KU Leuven (Belgium)
Abstract: Logistic regression, as a commonly used binary classification method, has been well studied in the classical setting with a fixed number of parameters $p$ and the sample size $n \to \infty$. However, modern data structures are more versatile, allowing both $p, n \to \infty$ according to a relative growth rate. We focus on a high dimensional setting with a linear rate $p/n \to \delta \in (0, \infty)$ and a sparse coefficient vector $\beta$ of which the components have a probability $s$ to be nonzero. To estimate $\beta$, we consider the $l_1$-regularized logistic regression estimator $\widehat\beta$, of which the limiting representation is characterized by a system of equations which can be used to obtain the exact expressions of performance measures of $\widehat\beta$ such as the mean squared error, probability of true and false discoveries. We show that the performance measures can be used to theoretically analyze the power of the knockoff-calibrated estimators, which allows controlling the false discovery rate (FDR). Further, analytical expressions of an estimator of the FDR are derived for practical use without requiring any information of $\beta$. We evaluate the performance of the knockoff-calibrated estimators by an extensive simulation study.