Title: Multi-class classification with imbalanced data: The choice of a categorical classifier
Authors: Silvia Golia - University of Brescia (Italy) [presenting]
Maurizio Carpita - University of Brescia (Italy)
Abstract: The issue of the choice of the categorical classifier is discussed, that is the procedure that, starting from the probabilities assigned to all the categories by a suitable method (probabilistic classifier), transforms these probabilities into a single class. The focus is on multi-class target variables, that is, variables that admit $k$ non-overlapping classes and the units are to be classified into one, and only one, of them. The standard choice is the Bayes Classifier (BC), which assigns, based on the probabilistic classifier, a unit to the most likely class. Nevertheless, BC has some limits with rare classes, given that it favors the prevalent class, and in situations in which there is not a class of interest or it is not prevalent, the BC cannot be the best choice. The aim is to investigate, through an extensive simulation study, the classification performances of the BC versus two alternatives, that is the Max Difference Classifier (MDC) and Max Ratio Classifier (MRC). The obtained results show that, in terms of Macro Recall and F-score measures and stability in the face of increasing class imbalance, MDC and MRC are better alternatives to BC. Some real case studies confirm what is observed in the simulation.