Workshop SDS: Registration
View Submission - SDS2022
A0191
Title: An EM algorithm for semi-supervised learning with data augmentation Authors:  Daniel Ahfock - University of Queensland (Australia) [presenting]
Geoffrey McLachlan - University of Queensland (Australia)
Abstract: A popular strategy for semi-supervised learning is to use unlabelled data to construct a regularization term to guide the training of a classification model. Data augmentation is a technique for the generation of artificial unlabelled data through the perturbation of available data. Data augmentation is frequently combined with consistency regularization, whereby the model is encouraged to make similar predictions on the original and perturbed data. We propose a consistency regularization technique based on the Bhattacharyya coefficient for use with data augmentation. An EM algorithm is developed for the maximization of the regularized likelihood. The asymptotic variance of the regularized semi-supervised estimator can be linked to the asymptotic variance of an unregularized supervised estimator given a completely classified sample. Theoretical analysis suggests that semi-supervised learning can be competitive with fully supervised learning under assumptions on the quality of the data augmentation procedure.