B1410
Title: Conditional independence testing for categorical data
Authors: Harvey Klyne - University of Cambridge (United Kingdom) [presenting]
Rajen D Shah - University of Cambridge (United Kingdom)
Abstract: The focus is on the problem of testing whether $X$ and $Y$, at least one of which is categorical, are conditionally independent, given a random vector $Z$. It is known that when components of $Z$ are continuous, no uniformly valid conditional independence test can hold power against any alternative, and so non-trivial tests can only hope to maintain the level over subsets of the null. We propose a test statistic based on the residuals from regressing a one-hot coding of the categorical data onto $Z$ that provides type I error control whenever the prediction errors decay sufficiently fast. The required rates are slow enough to accommodate settings where $Z$ is high-dimensional, and when the regression functions are nonparametric, these scenarios are not being covered by the standard log-linear analysis, for example. For cases where the number of levels of the categorical data is moderate to large, we propose an algorithm to optimally aggregate levels when performing the test that also maintains type I error control under similar conditions.