B1823
Title: Parallel tempering with a variational reference
Authors: Saifuddin Syed - University of Oxford (United Kingdom) [presenting]
Trevor Campbell - University of British Columbia (Canada)
Nikola Surjanovic - University of British Columbia (Canada)
Alexandre Bouchard - University of British Columbia (Canada)
Abstract: Sampling from complex target distributions is a challenging task fundamental to Bayesian inference. Parallel tempering (PT) addresses this problem by constructing a Markov chain on the expanded state space of a sequence of distributions interpolating between the posterior distribution and a fixed reference distribution, which is typically chosen to be the prior. However, in the typical case where the prior and posterior are nearly mutually singular, PT methods are computationally prohibitive. We address this challenge by constructing a generalized annealing path connecting the posterior to an adaptively tuned variational reference. The reference distribution is tuned to minimize the forward (inclusive) KL divergence to the posterior distribution using a simple, gradient-free moment-matching procedure. We show that our adaptive procedure converges to the forward KL minimizer, and that the forward KL divergence serves as a good proxy for a previously developed measure of PT performance. We also show that in the large-data limit in typical Bayesian models, the proposed method improves in performance, while traditional PT deteriorates arbitrarily. Finally, we introduce PT with two references one fixed, one variational, with a novel split annealing path that ensures stable variational reference adaptation. Finally, experiments that demonstrate the large empirical gains achieved by our method in a wide range of realistic Bayesian inference scenarios are discussed.