CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1177
Title: Structured factorization for single-cell gene expression data Authors:  Luisa Galtarossa - University of Padua (Italy)
Lorenzo Schiavon - University of Padova (Italy)
Antonio Canale - University of Padua (Italy)
Davide Risso - University of Padua (Italy)
Lorenzo Schiavon - University of Padova (Italy) [presenting]
Abstract: Single-cell gene expression experiments yield count data characterised by both high dimensionality and high complexity, with tens of thousands of cells and genes. In this context, factorization models represent a powerful tool to condense the available information through a sparse decomposition into lower-rank matrices. We adapt and implement a recent Bayesian class of generalized factorization models to count data and, specifically, to model the covariance between genes. The developed methodology also allows one to include exogenous information about genes within the prior, such that recognition of covariance structures between similar genes is favoured. We use biological pathways as external information to induce sparsity patterns within the loadings matrix. This also helps to assign a meaning to the loadings columns and, as a consequence, to the corresponding latent factors, which can be interpreted as unobserved cell covariates. We apply the model to sc-RNAseq data, collected on lung adenocarcinoma cell lines, showing promising results about the role of the pathways in characterizing the relations between genes and extracting valuable insights about unobserved cell traits.