Title: Non-asymptotic analysis of local-SGD
Authors: Aymeric Dieuleveut - EPFL (Switzerland) [presenting]
Abstract: Mini-batch stochastic gradient descent methods are the current state-of-the-art in large-scale distributed machine learning. Such methods are limited by the massive communication bottleneck they introduce. In this scenario, using a ``local-SGD'' model, where machines communicate their independent models frequently, might be preferred over the ``large mini-batch'' schemes to balance the computation-communication trade-off. A surprising lack of understanding of the behavior of these local-SGD methods is our primary motivation. We propose a simple non-asymptotic error analysis, which enables comparison at one extreme to one-shot averaging i.e., a single communication round among independent workers and at another to mini-batch averaging i.e., communicating at every step while underlining the computation-communication-performance tradeoffs. This is one of the first non-asymptotic analyses for local-SGD. We also give adaptive lower bounds on the frequency of communication, for local-SGD to perform optimally, i.e., as good as mini-batch averaging. Our results encompass ubiquitous algorithms like least-squares and logistic regression in the online as well as finite horizon setting. Moreover, they work well with large step-sizes and provide insights on the actual behavior of the stochastic process.