CMStatistics 2021: Start Registration
View Submission - CMStatistics
Title: Statistical modelling with noisy labels: An application to fraud detection Authors:  Daniel Ahfock - University of Queensland (Australia) [presenting]
Min Zhu - University of Queensland (Australia)
Abstract: Label noise is a practical consideration in many applications of supervised learning. Ground-truth labels may not be readily obtainable due to the high cost of acquisition. This limitation is encountered in accounting fraud detection, where regulators do not have the resources to investigate every firm for evidence of fraud. Some instances of fraud in the population may go undetected. There has been growing interest in the development of classifiers for the detection of accounting fraud given financial data from accounting reports. The training data consists of yearly financial data from each firm, and whether the firm was cited for accounting fraud by a regulator. We consider the development of a statistical model for accounting fraud detection, allowing for label noise in the training set. We propose to treat the ground-truth fraud status labels as latent variables and to model the label noise process. Maximum likelihood estimation can be carried out using the expectation-maximisation algorithm. We show how our approach can be used to train generalised additive models given noisy labels. We present an application of the method to historical fraud detection data.