View Submission - CMStatistics

B1041
**Title: **Bounding the width of neural networks via coupled initialization
**Authors: **Simon Omlor - TU Dortmund (Germany) **[presenting]**

**Abstract: **Two-layer ReLU neural networks with cross-entropy or squared loss can be seen as logistic resp. $l_2$ regression in the infinite-dimensional Neural Tangent Kernel (NTK) space when the number of neurons in the hidden layer is infinite. A common method in training such networks is to initialize all weights to be independent Gaussian vectors. We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the analysis for convergence to zero training error. Specifically, our technique allows reducing the number of hidden neurons required by the network, which corresponds to a dimensionality reduction for the NTK. In the under-parameterized setting with logistic loss, we improve previous width bounds from roughly $\gamma^{-8}$ to $\gamma^{-2}$, where $\gamma$ denotes the separation margin in the NTK space. We also present new lower bounds that corroborate the tightness of our analysis. Similar techniques also improve previous width bounds in the over-parameterized setting with squared loss from roughly $n^4$ to $n^2$.