Title: New ideas on Bayesian data sketching
Authors: Rajarshi Guhaniyogi - Texas A & M university (United States) [presenting]
Aaron Scheffler - University of California, San Francisco (United States)
Abstract: Bayesian computation of high dimensional linear regression models with a popular Gaussian scale mixture prior distribution using Markov Chain Monte Carlo (MCMC) or its variants can be extremely slow or completely prohibitive due to the heavy computational cost that grows in the cubic order with the number of features. We adopt the data sketching approach to compress the original samples by a random linear transformation to $m$ samples, and compute Bayesian regression with Gaussian scale mixture prior distributions with the randomly compressed response vector and feature matrix. The proposed approach yields computational complexity growing in the cubic order of $m$. Another important motivation for this compression procedure is that it anonymizes the data by revealing little information about the original data in the course of analysis. The detailed empirical investigation with the Horseshoe prior from the class of Gaussian scale mixture priors shows closely similar inference and a massive reduction in per iteration computation time of the proposed approach compared to the regression with the full sample. We characterize the dimension of the compressed response vector $m$ as a function of the sample size, the number of predictors and the sparsity in the regression to guarantee accurate estimation of predictor coefficients asymptotically, even after data compression.