Title: A variance component score test applied to RNA-Seq differential analysis
Authors: Boris Hejblum - Université de Bordeaux, Inserm BPH U1219, Inria SISTM, Vaccine Research Institute (France) [presenting]
Marine Gauthier - University of Bordeaux (France)
Rodolphe Thiebaut - Bordeaux University U 1219 INSERM (France)
Denis Agniel - RAND Corporation (United States)
Abstract: Gene expression measurement has shifted from microarrays to next generation RNA-sequencing, producing ever richer high-throughput data for transcriptomics studies. As such studies grow in size, frequency, and importance, there is an urgent need for statistical methods that better control the type-I error. We model transformed RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity, in a principled, model-free, and efficient manner. We rely on a powerful variance component score test that can deal with both covariates adjustment and data heteroscedasticity to identify the genes whose expression is significantly associated with one or several factors of interest in complex experimental designs (including longitudinal data). Our test statistic has a simple form and limiting distribution, which can be computed quickly. A permutation version of the test is also derived for small sample sizes. We show that our test has very good statistical properties in simulations, with an increase in stability and power when compared to state-of-the-art methods limma/voom, edgeR, and DESeq2. In particular, we show that those three methods can all fail to control the type I error when the sample size becomes larger, while our method behaves as expected. We apply our proposed method to two public datasets: one with repeated measurements investigating a candidate vaccine against EBOLA, and one studying tuberculosis.