CMStatistics 2021: Start Registration
View Submission - CMStatistics
Title: Oblivious sketching for logistic regression Authors:  Simon Omlor - TU Dortmund (Germany) [presenting]
Abstract: What guarantees are possible for solving logistic regression in one pass over a data stream? To answer this question, we present the first data oblivious sketch for logistic regression, which is an important generalized linear model for the classification and estimation of Bernoulli probabilities. The sketching matrix can be drawn from an oblivious, i.e., data-independent distribution over sparse random matrices which is simple to implement and can be applied to a data matrix $A\in\mathbb{R}^{n\times d}$ over a turnstile data stream in input-sparsity time $O(nnz(A))$, where $nnz$ denotes the number of non-zeros. This is important and has advantages over existing coreset constructions when it comes to high-velocity streaming applications and when data is not presented in row-order but in an arbitrary unstructured way. The resulting sketch consists of only $\operatorname{poly}(\mu d\log n)$ weighted points, where $\mu$ is a useful parameter that captures the complexity of compressing the data. Solving (weighted) logistic regression on the sketch gives an $O(\log n)$-approximation to the original problem on the full data set. We also show how the same sketch can be slightly adapted to give an $O(1)$-approximation. Our sketches are fast, simple, easy to implement, and our experiments demonstrate that those sketching techniques are useful, practical, and competitive to uniform sampling, SGD, and to state-of-the-art coresets.