CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: Maximum sampled likelihood estimation for informative subsampling Authors:  HaiYing Wang - University of Connecticut (United States) [presenting]
Jae Kwang Kim - Iowa State University (United States)
Abstract: Subsampling is an effective approach to extract useful information from massive data sets when computing resources are limited. Existing investigations focus on developing better sampling procedures and deriving probabilities with higher estimation efficiency. After a subsample is taken from the full data, most available methods use an inverse probability weighted target function to define the estimator. This type of weighted estimator reduces the contributions of more informative data points, and thus it does not fully utilize information in the selected subsample. The focus is on parameter estimation with a selected subsample. We propose to use the maximum sampled likelihood estimator (MSLE) based on the sampled data. We established the asymptotic normality of the MSLE, and prove that its variance-covariance matrix reaches the lower bound of asymptotically unbiased estimators. Specifically, the MSLE has a higher estimation efficiency than the weighted estimator. We further discuss the asymptotic results with the L-optimal subsampling probabilities. We illustrate the estimation procedure with generalized linear models. Numerical experiments are provided to evaluate the practical performance of the proposed method.