EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0914
Title: Model-robust subdata selection for big data Authors:  Chenlu Shi - Colorado State University (United States) [presenting]
Boxin Tang - Simon Fraser University (Canada)
Abstract: Subdata selection is necessary because of challenges arising from the statistical analysis of big data using limited computing resources. The existing work on subdata selection relies heavily on a specified model, which calls for an approach that is robust to model misspecification. We propose the use of space-filling designs for subdata selection and examine a fast algorithm for its implementation. The algorithm performs surprisingly well when compared to the reference distribution given by complete search. Simulations are conducted to compare our approach with a recently introduced IBOSS method, and the results show that our method is not just robust to model misspecification but also robust to model uncertainty. While robustness to model misspecification and uncertainty may be expected due to the nature of space-filling designs, we discover that our method enjoys an additional property of robustness when there exist substantial correlations among covariates.