CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Multi-study factor regression models for heterogenous data: Applications to cancer genomics and nutritional epidemiology Authors:  Alejandra Avalos Pacheco - Vienna University of Technology (Austria) [presenting]
David Rossell - Universitat Pompeu Fabra (Spain)
Roberta De Vito - Brown University (United States)
Jack Jewson - Universitat Pompeu Fabra and Barcelona Graduate School of Economics (Spain)
Richard S Savage - University of Warwick (United Kingdom)
Abstract: Data integration of multiple studies can be key to understanding and gaining knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. We will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provides a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects. MSFR are extensions of FR that enable us to jointly (i) capture common components across studies, (ii) isolate the sources of variation that are unique to each study, and (iii) correct for non-biological sources of variation. We will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. The approach provides a flexible methodology for sparse factor regression, which is not limited to data with covariate effects. We will present several examples, with a focus on bioinformatics applications. The results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality. The results of our analyses illustrate how failing to account for covariate effects properly can result in unreliable inference.