Workshop SDS: Registration
View Submission - SDS2022
Title: Perturbing data to address dataset shift in supervised classification Authors:  Laura Anderlucci - University of Bologna (Italy)
Angela Montanari - Alma mater studiorum-Universita di Bologna (Italy) [presenting]
Abstract: In supervised classification, dataset shift occurs when for the units in the test set a change in the distribution of a single feature, a combination of features, or the class boundaries, is observed with respect to the training set. As a result, in real data applications, the common assumption that the training and testing data follow the same distribution is often violated. Dataset shift might be due to several reasons; the focus is on what is called ``covariate shift'', namely the conditional probability $p(y|x)$ remains unchanged, but the input distribution $p(x)$ differs from training to test set. Random perturbation of variables or units when building the classifier can help in addressing this issue. Evidence of the performance of the proposed approach is obtained on simulated and real data.