Title: Bayesian basket of bandits
Authors: Eric Laber - Duke University (United States) [presenting]
Abstract: Contextual bandit models are a primary tool for sequential decision-making with applications ranging from clinical trials to e-commerce. While there are multiple bandit algorithms which achieve optimal regret and show strong performance on benchmark problems, algorithm selection and tuning in any given application remains a major open problem. We propose the Bayesian Basket of Bandits (B3), a meta-learning algorithm which automatically ensembles a set (basket) of candidate algorithms and tuning procedures to produce a learning algorithm which dominates all algorithms in the basket. The method works by treating the evolution of a bandit algorithm as a Markov decision process in which the states are posterior distributions over model parameters and subsequently applying approximate Bayesian dynamic programming to learn an optimal ensemble. We derive both Bayesian and frequentist convergence results for the cumulative discounted utility. In simulation experiments, our proposed method provides lower regret than state-of-the-art algorithms, including Thompson Sampling, Upper Confidence Bound methods, and information-directed sampling.