CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Subdata selection with a large number of variables Authors:  John Stufken - George Mason University (United States) [presenting]
Rakhi Singh - Binghamton University (United States)
Abstract: With ever larger datasets, computational challenges have led to a vast literature on using only some of the data (subdata) for estimation or prediction. This raises the question of how subdata should be selected from the entire dataset (full data). One possibility is to select the subdata randomly from the full data, but this is typically not the best method. The literature contains various suggestions for better alternatives. Most of these alternatives focus on situations where the number of variables is small to modest. We introduce a method that can be used for big data with a large number of variables in the context of linear regression.