Title: Classification with imperfect training labels
Authors: Timothy Cannings - University of Southern California (United States) [presenting]
Yingying Fan - University of Southern California (United States)
Richard Samworth - University of Cambridge (United Kingdom)
Abstract: The effect of imperfect training data labels on the performance of classification methods is studied. In a general setting, where the label errors occur at random and the probability that an observation is mislabelled depends on the feature vector and true label of the observation, we bound the average misclassification error of an arbitrary classifier trained with imperfect labels. Furthermore, under conditions on the labelling error probabilities, we derive the asymptotic properties of the popular $k$-nearest neighbour ($k$nn), Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) classifiers. The $k$nn and SVM classifiers are robust to imperfect training labels, in the sense that these methods are Bayes consistent. In fact, we see that in some cases imperfect labels can improve the performance of these methods. On the other hand, the LDA classifier is not robust to label noise, unless the prior probabilities of the classes are equal. Finally, our theoretical results are demonstrated via a simulation study.