CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1020
Title: A simple method for removing batch effects from single-cell RNA-sequencing data Authors:  Jun Li - University of Notre Dame (United States) [presenting]
Dailin Gan - University of Notre Dame (United States)
Abstract: Integrative analysis of multiple single-cell RNA-sequencing datasets allows for more comprehensive characterizations of cell types. Still, systematic technical differences between datasets, known as ``batch effects'', need to be removed before integration to avoid misleading interpretation of the data. Although many batch-effect-removal methods have been developed, there is still a large room for improvement: most existing methods only give dimension-reduced data instead of expression data of individual genes, are based on computationally-demanding models, and are black-box models and thus difficult to interpret or tune. We present a new batch-effect-removal method called SCIBER and study its performance on real datasets. SCIBER matches cell clusters across batches according to the overlap of their differentially expressed genes. As a simple algorithm that has better scalability to data with a large number of cells and is easy to tune, SCIBER shows comparable and sometimes better accuracy in removing batch effects on real datasets compared to the state-of-the-art methods, which are much more complicated. Moreover, SCIBER outputs the expression of individual genes, which can be used directly for downstream analyses. Additionally, SCIBER is a reference-based method, which assigns one of the batches as the reference batch and keeps it untouched during the process, making it especially suitable for integrating user-generated datasets with standard reference data.