Title: Provable training of certain finite size neural nets at depth 2
Authors: Anirbit Mukherjee - University of Pennsylvania (United States) [presenting]
Ramchandran Muthukumar - Johns Hopkins University (United States)
Abstract: One of the paramount mathematical mysteries is to be able to explain the phenomenon of deep-learning. Neural nets can be made to paint while imitating classical art styles or play chess better than any machine or human ever, and they seem to be the closest we have ever come to achieving ``artificial intelligence''. But trying to reason about these successes quickly lands us into a plethora of extremely challenging mathematical questions - typically about discrete stochastic processes. Some of these questions remain unsolved for even the smallest neural nets! We will give a brief introduction to neural nets and describe our recent work about provable training of finitely large depth 2 single filter generalized convolutional nets. Firstly, we will explain how under certain structural and mild distributional conditions our iterative algorithms like ``Neuro-Tron" which do not use a gradient oracle can often be proven to train nets using as much time/sample complexity as expected from gradient-based methods but in regimes where usual algorithms like (S)GD remain unproven. Our theorems include the particularly challenging regime of non-realizable data. Secondly, we will explain how for a single ReLU gate slight modification to SGD can get us data-poisoning resilient training.