CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Doubly robust feature selection with mean and variance outlier detection and oracle properties Authors:  Luca Insolia - University of Geneva (Switzerland) [presenting]
Francesca Chiaromonte - The Pennsylvania State University (United States)
Runze Li - The Pennsylvania State University (United States)
Marco Riani - University of Parma (Italy)
Abstract: High-dimensional linear regression models are nowadays pervasive in most research domains. We propose a general approach to handle data contaminations that might disrupt the performance of feature selection and estimation procedures. Specifically, we consider the co-occurrence of mean-shift and variance-inflation outliers, which can be modeled as additional fixed and random components, respectively, and evaluated independently. Our proposal performs feature selection while detecting and down-weighting variance-inflation outliers, detecting and excluding mean-shift outliers, and retaining non-outlying cases with full weights. Feature selection and mean-shift outlier detection are performed through a robust class of nonconcave penalization methods. Variance-inflation outlier detection is based on the penalization of the restricted posterior mode. The resulting approach satisfies a robust oracle property for feature selection in the presence of data contamination -- which allows the number of features to exponentially increase with the sample size -- and detects truly outlying cases of each type with asymptotic probability one. This provides an optimal trade-off between a high breakdown point and efficiency. Effective and computationally efficient heuristic procedures are also presented. We illustrate the finite-sample performance of our proposal through an extensive simulation study and real-world applications.