Title: Estimating the number of communities by stepwise goodness-of-fit
Authors: Jiashun Jin - Carnegie Mellon University (United States)
Tracy Ke - University of Chicago (United States)
Shengming Luo - Carnegie Mellon University (United States)
Minzhe Wang - University of Chicago (United States)
Tracy Ke - Harvard University (United States) [presenting]
Abstract: Given a symmetric network with $n$ nodes, how to estimate the number of communities $K$ is a fundamental problem. We propose Stepwise Goodness-of-Fit (StGoF) as a new approach to estimate $K$. For $m = 1, 2,\ldots$, StGoF alternately uses a community detection step and a goodness-of-fit step. We use SCORE for community detection, and propose a goodness-of-fit (GoF) measure. We show that the GoF statistic converges to $N(0,1)$ when $m < K$ and diverges to infinity in probability when $m = K$. Therefore, with a proper threshold, StGoF terminates at $m = K$ as desired. We consider a broad setting where we allow severe degree heterogeneity, a wide range of sparsity, and especially weak signals. In particular, we propose a measure for signal-to-noise ratio (SNR) and show that there is a phase transition: when SNR $\to 0$ as $n\to\infty$, consistent estimates for $K$ do not exist, and when SNR tend to infinity, StGoF is consistent, uniformly for a broad class of settings. In this sense, StGoF achieves the optimal phase transition. Stepwise testing algorithms of a similar kind are known to face analytical challenges. We overcome the challenges by using a different design in the stepwise algorithm and by deriving sharp results in the under-fitting case ($m < K$) and the null case ($m = K$). The key to our analysis is to show that SCORE has the Non-Splitting Property (NSP). The NSPis non-obvious, so additional to rigorous proofs, we also provide an intuitive explanation.