Title: Subsampling from big datasets through optimal design
Authors: Chiara Tommasi - University of Milan (Italy) [presenting]
Laura Deldossi - University Cattolica del Sacro Cuore (Italy)
Abstract: Big Data are huge amounts of digital information that rarely result from properly planned surveys; as a consequence, they often contain redundant data. A Big Dataset is herein conceived as a finite population generated by a super-population model. When the aim is to answer a particular question of interest, we suggest selecting a subsample of observations that contains the majority of the information to achieve this inferential goal. The selection methods driven by the theory of optimal design incorporate inferential purposes and thus perform better than standard sampling schemes.