Title: Statistics of stochastic gradient descent: Stability, efficiency, and inference
Authors: Panagiotis Toulis - University of Chicago (United States) [presenting]
Abstract: Stochastic gradient descent (SGD) is remarkably multi-faceted: for machine learners it is a powerful optimization method, but for statisticians it is a method for iterative estimation. While several important results are known for optimization properties of SGD, surprisingly little is known about its statistical properties. We will review recent results on doing statistics with SGD, which include analytic formulas for the asymptotic covariance matrix of SGD-based estimators and a numerically stable variant of SGD with implicit updates. Together these results open up the possibility of doing principled statistical analysis with SGD, including classical inference and hypothesis testing. Specifically about inference, we present current work showing that with appropriate selection of the learning rate the asymptotic covariance matrix of SGD is isotropic and parameter-free. As such, some SGD-based estimators can be easily transformed into pivotal quantities, which substantially simplify inference. This is a unique and remarkable property of SGD, even compared to popular estimation methods favored by statisticians, such as maximum likelihood, highlighting the untapped potential of SGD for fast and principled estimation with large data sets.