CMStatistics 2022: Start Registration
View Submission - CMStatistics
B0472
Title: A new BART prior for structured categorical inputs Authors:  Sameer Deshpande - University of Wisconsin--Madison (United States) [presenting]
Abstract: Default implementations of Bayesian Additive Regression Trees (BART) represent categorical predictors using several binary indicators, one for each level of each categorical predictor. Regression trees with these indicators send one level of a categorical predictor to the left and all other levels to the right. Unfortunately, most partitions of the levels cannot be built with this ``remove one at a time'' strategy, meaning that default implementations of BART are limited in their ability to ``borrow strength'' across groups of levels. We overcome this limitation with a prior for a new class of regression trees that can send multiple levels of a categorical variable to each child of a decision node in a regression tree. Our prior corresponds to a partitioning process that respects a priori preferences to co-cluster certain levels of a structured categorical variable. In spatiotemporal applications, such variables are frequently used to encode membership in spatial units like census tracts or counties. In these applications, our new regression trees induce spatially-contiguous partitions of the spatial units. Our new prior often yields improved out-of-sample predictive performance without much additional computational burden. Despite its conceptual simplicity, our new prior opens the door for Bayesian treed regression over complex discrete spaces like networks.