B0823
Title: Bootstrapping asymmetric binary regression models for massive unbalanced datasets
Authors: Marialuisa Restaino - University of Salerno (Italy) [presenting]
Marcella Niglio - University of Salerno (Italy)
Michele La Rocca - University of Salerno (Italy)
Abstract: Unbalanced binary data are characterized by fewer events (ones) than non-events (zeros). The unbalanced variables are difficult to predict and explain, especially in high-dimensional settings and in the presence of massive datasets, where unbalancing might be even more critical. The logistic model may not be appropriate for such data since it strongly underestimates the probability of unbalanced events because the estimators tend to be biased towards the majority class. Moreover, as underlined in the literature, the bias of the maximum likelihood estimators of logistic regression parameters in small sample sizes could be amplified in the context of unbalanced events. Thus, in this framework, there is an increasing interest in using asymmetric link functions to investigate the relationship between the binary response variable and a set of predictors. These link functions are characterized by a parameter able to manage the imbalance in the response variable. The work aims to estimate the probability of one given a set of features by using asymmetric link functions for binary data, also taking into account the effects on the response variable of class imbalance in categorical predictors. Confidence intervals and hypothesis testing are constructed using bootstrap methods, specifically designed for massive datasets in multiple testing perspectives. The performance of the proposed procedure is evaluated by Monte Carlo simulation studies and applications to real datasets.