Title: Adaptively exploiting d-separators with causal bandits
Authors: Blair Bilodeau - University of Toronto (Canada) [presenting]
Linbo Wang - University of Toronto (Canada)
Daniel Roy - University of Toronto (Canada)
Abstract: Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent ``causal bandit'' algorithms provably incur less regret. However, in practice, it is desirable to be agnostic as to whether observed variables are a $d$-separator. Ideally, an algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of the presence or absence of a $d$-separator. We formalize and study this notion of adaptivity, and provide a novel algorithm that simultaneously achieves (a) optimal regret when a $d$-separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator. Crucially, our algorithm does not require any oracle knowledge of whether a $d$-separator is observed. We also generalize this adaptivity to other conditions, such as the front-door criterion.