Title: A feature distributed framework for large-scale sparse regression
Authors: Chenlei Leng - Warwick (United Kingdom) [presenting]
Abstract: Large-scale data with a large number of features are increasingly encountered. A framework is presented for sparse high-dimensional linear regression by distributing features to multiple machines. Our method performs similarly to the infeasible oracle estimator in a centralized setting for which all the data are fitted on a single machine. Remarkably, this performance is achieved for elliptically distributed features including Gaussian variables as a special case, for any heavy tailed noises with a finite second moment, for sparse and weakly sparse signals, and for most popular sparse regression methods. Rather surprisingly, we show that a lower bound of the convergence rate of the resulting estimator does NOT depend on the number of machines. Extensive numerical studies are presented to illustrate its competitive performance.