Title: Time efficient multiple imputation with penalization for high-dimensional data
Authors: Faisal Maqbool Zahid - Ludwig-Maximilians-University Munich Germany (Germany) [presenting]
Christian Heumann - Ludwig-Maximilians-University Munich (Germany)
Abstract: The analysis of modern data based on high-throughput technology often faces the problem of missing data. Multiple imputation (MI) by sequential regression is a flexible and practical approach to handling the missing data. The precise strategy to conduct MI in the presence of high-dimensional data is still not clear in the literature. The decision about the number of predictors in the imputation model is also arguable in the literature. The likelihood estimates become unstable when the number of predictors $p$ is large relative to the sample size $n$, and do not exist for $p>n$. For selection and fitting of the imputation model, we use penalization in different ways in the presence of high-dimensional data. We tune the L1 penalty to allow different number of informative predictors in the imputation model, and then use maximum likelihood estimation or L2 penalty for fitting the imputation model. We compared the performance of our proposed approaches in high-dimensional data structures through different simulation studies and two real life datasets. The proposed approach is time efficient and performs equally well in low dimension. The proposed methods show a better performance than the existing MI approaches in terms of Mean Squared Imputation Error (MSIE) and MSE($\hat\beta$).