Title: A Bayesian sparse latent factor model for identification of cancer subgroups with data integration
Authors: Zequn Sun - Medical University of South Carolina (United States)
Brian Neelon - Medical University of South Carolina (United States)
Dongjun Chung - Ohio State University (United States) [presenting]
Abstract: Identification of cancer subgroups is of critical importance for the development of precise therapeutic strategies for various types of cancer. The Cancer Genome Atlas (TCGA) have generated tremendous amount of high throughput genomic data, which profiles somatic mutation, copy number alteration, DNA methylation, gene expression for each patient. This large-scale cancer genomic data provides unprecedented opportunity to investigate cancer subgroups using integrative approaches based on multiple types of genomic data. We will discuss our recent work on a Bayesian sparse latent factor model for simultaneous identification of cancer subgroups (clustering) and key molecular features (variable selection), based on a joint analysis of continuous, binary, and count data. In addition, by utilizing pathway (variable group) information, this approach does not only improve accuracy and robustness in identification of cancer subgroups and key molecular features, but also facilitates biological understanding of novel findings generated with this approach. Finally, in order to facilitate efficient posterior sampling, a heavy-tailed prior is specified for continuous data while alternative Gibbs samplers are proposed based on Polya-Gamma mixtures of Normal densities for binary and count data. We will illustrate the proposed statistical model with simulation studies and its application to the TCGA data.