CMStatistics 2022: Start Registration
View Submission - CMStatistics
B0764
Title: Implicit Bias of the Step Size in over-parameterized models Authors:  Daniel Soudry - Technion (Israel) [presenting]
Abstract: Focusing on diagonal linear networks and shallow neural networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias toward ``smoother'' predictors, and thus on generalization ability. In particular, in diagonal linear networks, we show how using large step size for non-centered data can change the implicit bias from a ``kernel'' type behavior to a "rich" (sparsity-inducing) regime --- even when gradient flow, studied in previous works, would not escape the ``kernel'' regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted L1-norm of the linear predictor, i.e. a ``rich'' regime. We prove this leads to good generalization in a sparse regression setting.