B1639
Title: Policy learning for optimal dynamic treatment regimes with observational data
Authors: Shosei Sakaguchi - University of Tokyo (Japan) [presenting]
Abstract: Statistical decisions for dynamic treatment assignment problems are studied. Many policies involve dynamics in their treatment assignments where treatments are sequentially assigned to individuals across multiple stages, and the effect of treatment at each stage is usually heterogeneous with respect to the prior treatments, past outcomes, and observed covariates. We consider learning an optimal dynamic treatment regime that guides the optimal treatment assignment for each individual at each stage based on the individual's history. We propose two doubly-robust learning approaches using observational data under the assumption of sequential ignorability. The first approach solves the treatment assignment problem at each stage through backward induction, and the second approach solves the whole dynamic treatment assignment problem simultaneously across all stages. Using doubly-robust estimators of treatment effect scores and cross-fitting, each of the approaches can achieve the minimax optimal convergence rate $O_{p}(n^{-1/2})$ of welfare regret even when nuisance components are non-parametrically estimated.