Title: From shallow to deep: Theoretical insights into training of neural networks
Authors: Mahdi Soltanolkotabi - University of Southern California (United States) [presenting]
Abstract: Neural network architectures (a.k.a. deep learning) have recently emerged as powerful tools for automatic knowledge extraction from data, leading to major breakthroughs in applications spanning visual object classification to speech recognition and natural language processing. Despite their wide empirical use the mathematical success of these architectures remains a mystery. One challenge is that training neural networks correspond to extremely high-dimensional and nonconvex optimization problems and it is not clear how to provably solve them to global optimality. While training neural networks is known to be intractable in general, simple local search heuristics are often surprisingly effective at finding global/high quality optima on real or randomly generated data. We will discuss some results explaining the success of these heuristics. We will discuss results characterizing the training landscape of single hidden layer networks demonstrating that when the number of hidden units are sufficiently large then the optimization landscape has favorable properties that guarantees global convergence of (stochastic) gradient descent to a model with zero training error. Second, we introduce a de-biased variant of gradient descent called Centered Gradient Descent (CGD). We will show that unlike gradient descent, CGD enjoys fast convergence guarantees for arbitrary deep convolutional neural networks with large stride lengths.