B1676
Title: On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model
Authors: Peizhong Ju - The Ohio State University (United States) [presenting]
Xiaojun Lin - Purdue University (United States)
Ness Shroff - The Ohio State University (United States)
Abstract: The generalization performance of overparameterized 3-layer NTK models is studied. We show that, for a specific set of ground-truth functions (which we refer to as the ``learnable set''), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK, where only one hidden layer exists, the 3-layer NTK involves interactions between two hidden layers. Our upper bound reveals that, between the two hidden layers, the test error descends faster with respect to the number of neurons in the second hidden layer (the one closer to the output) than with respect to that in the first hidden layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the neurons. However, in terms of the actual generalization performance, our results suggest that 3-layer NTK is much less sensitive to the choices of bias than 2-layer NTK, especially when the input dimension is large.