CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Benefit of interpolation in nearest neighbor algorithms Authors:  Yue Xing - Purdue University (United States) [presenting]
Qifan Song - Purdue University (United States)
Guang Cheng - University of California Los Angeles (United States)
Abstract: In some studies of deep learning, it is observed that over-parametrized deep neural networks achieve a small testing error even when the training error is almost zero. Despite numerous works towards understanding this so-called double descent phenomenon, we turn to another way to enforce zero training error (without over-parametrization) through a data interpolation mechanism. Specifically, we consider a class of interpolated weighting schemes in the nearest neighbors (NN) algorithms. By carefully characterizing the multiplicative constant in the statistical risk, we reveal a U-shaped performance curve for the level of data interpolation in both classification and regression setups. This sharpens the existing result that zero training error does not necessarily jeopardize predictive performances and claims a counter-intuitive result that a mild degree of data interpolation actually strictly improves the prediction performance and statistical stability over those of the (un-interpolated) k-NN algorithm. In the end, the universality of our results, such as the change of distance measure and corrupted testing data, will also be discussed.