Title: Nonparametric estimation in a multi-armed bandit problem with covariates
Authors: Wei Qian - University of Delaware (United States) [presenting]
Yuhong Yang - University of Minnesota (United States)
Abstract: The multi-armed bandit problem is a popular online decision-making problem with a wide range of modern applications in data science. Its classical setting consists of multiple slot machines with unknown expected payoffs, and the goal of the optimization game is to design a sequential arm allocation algorithm to maximize the total payoff. Motivated by promising applications in personalized medical and online web service, we consider a setting where the mean rewards of bandit machines are associated with covariates. With the key tradeoff between exploring new information and exploiting history information, we propose a kernel estimation based sequential allocation algorithm with randomization, and investigate its asymptotic and finite-time optimality under a nonparametric framework. In addition, since many nonparametric and parametric methods in supervised learning may be applied to estimating the mean reward functions, we integrate a model combining strategy into the allocation algorithm for adaptive performance. Simulations and real data evaluation are conducted to illustrate the algorithm performance and support the necessary consideration of covariates.