CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Factor analysis in high dimensional biological data with dependent observations Authors:  Christopher McKennan - University of Pittsburgh (United States) [presenting]
Abstract: Factor analysis is a critical component of high-dimensional biological data analysis. However, modern biological data contain two key features that irrevocably corrupt existing methods. First, these data, which include longitudinal, multi-treatment and multi-tissue data, contain samples that break critical independence requirements necessary for the utilization of prevailing methods. Second, biological data contain factors with large, moderate and small signal strengths, and therefore violate the ubiquitous ``pervasive factor'' assumption essential to the performance of many methods. We develop a novel statistical framework and the first set of provably accurate estimators to perform factor analysis and interpret its results in dependent data with factors whose signal strengths span several orders of magnitude. We prove that this methodology can be used to solve many important and previously unsolved problems that routinely arise when analyzing dependent biological data, including high dimensional covariance estimation, subspace recovery, latent factor interpretation and data denoising. Additionally, we show that my estimator for the number of factors overcomes both the notorious ``eigenvalue shadowing'' problem, as well as the biases due to the pervasive factor assumption that plague existing estimators. Simulated and real data demonstrate the superior performance of my methodology in practice.