Title: Double sampling and semiparametric methods for informatively missing data
Authors: Sebastien Haneuse - Harvard TH Chan School of Public Health (United States) [presenting]
Abstract: Large observational databases, such as those derived from electronic health records (EHR), are increasingly being used for clinical and public health research. Despite the many benefits, these data are often subject to complex and poorly understood patterns of missing data, such that the typical missing-at-random assumption may be untenable. In contrast to traditional methods of sensitivity analysis and estimation of parameter bounds, we explore double sampling in which complete data can be obtained on a subsample via intensive follow-up. We discuss assumptions and designs under which the joint density of interest is identified, and present a general approach for constructing estimators in the augmented sample. From this analysis, we show when the initial missingness process itself is identified, and how the associated missing-at-random assumption can be tested. Further, we apply the framework to derive semiparametric efficient and multiply robust estimators of causal average treatment effects from double-sampled observational data when outcome data are initially missing, not at random.