CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: Corrected information criterion for coefficient selection and estimation in structured sparse linear regression Authors:  Bastien Marquis - Université libre de Bruxelles (Belgium) [presenting]
Maarten Jansen - ULB Brussels (Belgium)
Abstract: In high-dimensional linear regression, when the vector of coefficients is sparse, regularisation is widely used to obtain an estimate. In particular, $\ell_1$-regularisation has many attractive qualities as it allows a selection of nonzero coefficients, in addition, to be computationally efficient; however, its solutions are shrunk versions of the ordinary least squares estimator. This can lead to a bias amongst the large coefficients but also results in an overestimation of the model size from the optimisation of an information criterion. To prevent these effects, we propose to use $\ell_1$-regularisation as a method to select nonzero coefficients while using a least-squares projection for the estimation of the selection, avoiding the shrinkage this way. Then the optimal balance between the sum of residual squares and the regularisation should shift towards smaller models. This requires a correction of the expression of the information criterion. Looking into the difference between the Prediction Error and the expected Mallows's Cp, a corrected Mallows's Cp is developed for multiple linear regression. The correction is further analysed in structured models; in particular, group selection is considered.