A0198
Title: On data subsampling for Poisson regression
Authors: Han Cheng Lie - University of Potsdam (Germany)
Alexander Munteanu - TU Dortmund (Germany) [presenting]
Abstract: The aim is develop and analyze new data subsampling techniques for Poisson regression with count data $y\in\mathbb{N}$. In particular, we consider the Poisson generalized linear model with ID and square-root link functions. We consider the method of \emph{coresets}, small weighted subsets that approximate the log-likelihood up to a factor of $1\pm\varepsilon$. By introducing a novel complexity parameter $\rho$ and a domain shifting approach, we show that sublinear coresets with $1\pm\varepsilon$ approximation guarantee exist when $\rho$ is small. In particular, the number of input points can be reduced to polylogarithmic. We show that the dependence on other input parameters can also be bounded, though not always logarithmically. In particular, we show that the square root-link admits $O(\log(y_{\max}))$ dependence on the largest count, while the ID-link requires $\Theta(\sqrt{y_{\max}/\log(y_{\max})})$.