Title: Variable selection for high-dimensional features with missing data
Authors: Kin Yau Wong - Hong Kong Polytechnic University (Hong Kong) [presenting]
Abstract: In biomedical, epidemiological, or social studies, one often encounters high-dimensional data with missing data. Conventional methods for handling high-dimensional data, such as penalization methods, are not directly applicable to problems with missing data. Simple methods for handling missing data, such as complete-case analysis and single imputation, are generally inefficient and may even be invalid. We consider a regression framework with incomplete predictors and propose a latent variable model to characterize the relationships among predictors and to infer missing values from observed data. Under this framework, we propose a penalized regression parameter estimator and develop a computationally efficient Expectation-Maximization algorithm for its computation. We demonstrate the satisfactory performance of the proposed methods through simulation studies and provide an application to a motivating cancer study that contains substantial proportions of missing genomics data.