Title: Logistic regression with missing continuous and categorical data
Authors: Julie Josse - INRIA (France)
Wei Jiang - Ecole Polytechnique (France) [presenting]
Erwan Scornet - Polytechnique (France)
Abstract: To make inference with missing values, a recommended approach consists in using an EM algorithm to obtain maximum likelihood estimates and a supplemented EM algorithm for the variance. However, it is often said that it is not necessary straightforward to derive such algorithms and it can be observed that indeed this approach is almost never used in practice neither implemented. The use of (multiple) imputation is more popular and has the great advantage that it is not only designed for one statistical method but it allows to carry many analyses from the same data. We will thoughtfully compare both approaches to perform logistic regression with missing values and both categorical and continuous data. We will present a quite straightforward Monte Carlo EM based on Sampling Importance Resampling which can be used as an alternative to the computationally adaptive rejection sampling within Gibbs in the framework of GLM. Imputation methods include recent proposal for mixed data based on principal component methods and non-parametric Bayesian. The methods will be illustrated on the analysis of a large register from the Paris Hospital (APHP) to model the decisions and events when severe trauma patients are handled by emergency doctors.