Title: Statistical learning of big dependent data
Authors: Ruey Tsay - The University of Chicago Booth School of Business (United States) [presenting]
Abstract: In the modern big data environment, most real applications are concerned with data that are spatial and/or temporal dependent. For example, in environmental studies, the PM2.5 indexes are typically collected at various monitoring stations over a long period of time. Yet most of the methods for analyzing such data developed in machining learning and statistical literature were derived under the independent assumption. The dynamical dependence in the dependent variable of interest and in the cross-sectional and serial dependence among the predictors may lead to erroneous inference if such dependence is overlooked. We present some methods that can handle big dependent data and provide theoretical justifications for some commonly used methods in the presence of dependent data. Real examples are used throughout to illustrate the effects of dynamical dependence on the traditional methods and to show the gains obtained by the proposed methods.