CMStatistics 2018: Start Registration
View Submission - CMStatistics
Title: Matrix sketching and supervised classification Authors:  Laura Anderlucci - University of Bologna (Italy)
Roberta Falcone - University of Bologna (Italy) [presenting]
Angela Montanari - Alma mater studiorum-Universita di Bologna (Italy)
Abstract: Matrix sketching is a recently developed data compression technique. An input matrix $A$ is efficiently approximated with a smaller matrix $B$, so that $B$ preserves most of the properties of $A$ up to some guaranteed approximation ratio. In so doing numerical operations on big data sets become faster. Sketching algorithms generally use random projections to compress the original dataset and this stochastic generation process makes them amenable to statistical analysis. The statistical properties of sketched regression algorithms have been widely studied previously. We study the performances of sketching algorithms in the supervised classification context, both in terms of misclassification rate and of boundary approximation, as the degree of sketching increases. We also address, through sketching, the issue of unbalanced classes, which hampers most of the common classification methods.