Title: Features selection and combination in high-dimensional data with the penalized Youden index
Authors: Claudio Junior Salaroli - University Complutense of Madrid (Spain) [presenting]
Maria del Carmen Pardo - Complutense University of Madrid (Spain)
Abstract: In high-dimensional classification contexts, like with -omics data, with thousands of biomarkers and dozens of observations, it is crucial to combine regressors omitting the noise caused by thousands of irrelevant features. To achieve this task, regularization techniques are very popular methods that, adding a penalization term to the original optimization problem, allow us to achieve a sparse estimation, improving classification performances and interpretability of the result. The application of these techniques to the Youden index function, i.e. the distance between the ROC curve and the chance line, is proposed. The resulting new methodology, named Penalized Youden Index Estimator (PYE), allows to select and combine biomarkers simultaneously in a high-dimensional context, also identifying the optimal cut-off point. One additional improvement is given by considering the cut-off point as a function of specifics of the patient, like sex, age, habits such as smoking or sports activity, and so on, named covariates. This upgraded version of PYE has been called Penalized Youden index Estimator with Covariate adjusted cut-off point, or cPYE. The performances of these new approaches are compared with some popular existing methods, showing top performances in both selection and combination.