CMStatistics 2021: Start Registration
View Submission - CMStatistics
Title: Co-data learning in ridge models for high-dimensional data Authors:  Mirrelijn van Nee - Amsterdam University Medical Centers (Netherlands) [presenting]
Mark van de Wiel - Amsterdam University Medical Centers (Netherlands)
Abstract: Prediction is hard when data are high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as genomic location or p-values from external studies in cancer genomics. We use multiple and various co-data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi-group ridge penalties for generalised linear and Cox models. Available group adaptive methods primarily target settings with few groups, and therefore likely overfit for non-informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co-data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co-data sets, the inclusion of unpenalised covariates and posterior variable selection. As an illustrating example, we demonstrate the method in an oncogenomics setting.