Workshop SDS: Registration
View Submission - SDS2022
A0188
Title: Machine learning based mass imputation approaches for combining probability sample and nonprobability sample Authors:  Sixia Chen - University of Oklahoma (United States) [presenting]
Abstract: Although probability samples have been regarded as the gold standard to collect information for population-based studies, non-probability samples have been used frequently in practice due to their low cost, convenience, and the difficulties in creating the sampling frames. Naive estimates based on non-probability samples without any adjustments may be misleading due to the selection bias. Recently, a valid data integration approach including mass imputation, propensity score weighting, and calibration has been used to improve the representativeness of non-probability samples. However, the effectiveness of mass imputation approaches depends on the underlying model assumption. We propose and compare several modern machine learning-based mass imputation approaches including generalized additive modeling, regression tree, random forest, XG-boosting, Support vector machine, and deep learning. Machine learning-based approaches have been shown to be more robust compared with the parametric mass imputation approach against the failure of underlying model assumptions. In addition, deep learning has been shown to be the most effective for handling hierarchical non-linear data structures. We evaluate our proposed methods by using both simulation study and real application.