Title: Dealing with a small number of large clusters using iterative bootstrap
Authors: Stephane Heritier - Monash University (Australia) [presenting]
Maria-Pia Victoria-Feser - University of Geneva (Switzerland)
Stephane Guerrier - Pennsylvania State University (United States)
Abstract: Generalized estimating equations is commonly used in cluster randomized trials (CRTs) to account for within-cluster correlation. It is well known that the sandwich variance estimator is biased when the number of clusters is small ($<40$), resulting in an inflated type I error rate. The problem is particularly acute with binary outcomes, a common situation in medicine. Various bias correction methods have been proposed in the statistical literature but are bound to fail due to their reliance on asymptotic formulas used beyond their validity domain. This situation is becoming alarming in multi-period CRTs such as stepped-wedge or cluster crossover designs where it is commonplace to have data with 10 to 20 large clusters, sometimes even less. We propose a radically new approach that does not rely on first-order asymptotics. The method starts with a simple estimator of an auxiliary parameter that is then corrected to estimate the main parameter of interest with virtually no bias. Inference is possible through a nearly exact distribution obtained by simulations using the iterative bootstrap. We illustrate the performance of this approach for binary clustered data with $n=10$ to 20 clusters. The method is general enough to accommodate other models like generalised mixed models or different endpoints.