Title: Structured feature aggregation in regression with microbiome data
Authors: Jacob Bien - University of Southern California (United States) [presenting]
Christian L. Mueller - Simons Foundation (United States)
Xiaohan Yan - Cornell University (United States)
Abstract: Microbiome data allow marine ecologists to understand the composition of microbes across time and space in the ocean. Much statistical work has focused on two technical challenges for the analysis of microbiome data: (i) it is high-dimensional, i.e., there are a large number of microbes and (ii) it is not meaningful to directly compare the raw absolute abundances measured, which means compositional data methods are used. The focus is on yet another major challenge, which has received far less attention than the other two: microbiome data has a high degree of sparsity, i.e., the vast majority of microbes measured are generally present in only a few samples. Researchers generally take ad hoc approaches such as filtering out rare microbes or manually aggregating them to the genus or family level. We propose instead a principled regression framework, which we demonstrate addresses all three of these challenges while also providing easily interpretable models.