Title: Improved classification accuracy through inclusion of latent variables
Authors: Johann Gagnon-Bartsch - University of Michigan (United States) [presenting]
Yujia Pan - University of Michigan (United States)
Abstract: Classification of high-throughput genomic data is challenging because the signal is often weak and sparse. Incorporating side information or additional covariates (e.g., gender, age) can lead to better predictive accuracy, but it is often the case that such information is unknown. To this end, we introduce a classifier which adaptively leverages both observed variables as well as inferred latent ones. Including these latent variables tends to improve accuracy, sometimes substantially, as illustrated on several simulated and genomic datasets. A diverse collection of genomic datasets are considered (gene expression, methylation, and SNP data), as well as a wide range of disease phenotypes (asthma, Alzheimer's disease, tuberculosis, and schizophrenia), illustrating broad applicability.