Title: Convergence of mean field gradient Langevin dynamics for optimizing two-layer neural networks
Authors: Taiji Suzuki - University of Tokyo / RIKEN-AIP (Japan) [presenting]
Atsushi Nitanda - Kyushu Institute of Technology (Japan)
Denny Wu - University of Toronto (Canada)
Kazusato Oko - The University of Tokyo (Japan)
Abstract: The optimization of two-layer neural networks is discussed via the gradient Langevin dynamics in the mean-field regime. For that purpose, we first establish a linear convergence guarantee of the mean-field gradient Langevin algorithm in the infinite width limit under a uniform log-Sobolev inequality condition. Next, we propose a few specific optimization methods for the finite width and discrete-time setting. In particular, we introduce an algorithm that enjoys linear convergence for a finite-sum loss function based on the stochastic dual coordinate ascent method. Finally, we discuss the linear convergence of the vanilla gradient Langevin dynamics without the infinite width assumption but under slightly different regularity conditions.