Title: Statistical inference for streaming PCA in high dimensions
Authors: Robert Lunde - Washington University in St Louis (United States) [presenting]
Purnamrita Sarkar - University of Texas at Austin (United States)
Rachel Ward - University of Texas at Austin (United States)
Abstract: The problem of quantifying uncertainty is considered for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a weighted chi-squared approximation result for the sin-squared error between the population eigenvector and the output of Oja's algorithm. Under certain structural assumptions on the covariance matrix, we show that the error of the weighted chi-squared approximation goes to zero even when $p >>n$. Furthermore, to facilitate statistical inference, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.