Title: Non-exchangeable random partition model for microclustering
Authors: Giuseppe Di Benedetto - University of Oxford (United Kingdom)
Francois Caron - University of Oxford (United Kingdom) [presenting]
Yee Whye Teh - Oxford University (United Kingdom)
Abstract: Clustering aims at finding a partition of the data. In a Bayesian framework, this task is addressed by specifying a prior distribution on the partition of the data. Popular models, such as the Chinese Restaurant Process and its two-parameters generalization, rely on some exchangeability assumption; while this assumption may be reasonable for some applications, it has strong implications on the asymptotic properties of the cluster sizes. In fact, exchangeable random partitions imply the linear growth of the cluster sizes, which is not suitable for several applications. We will present a flexible non-exchangeable random partition model, based on completely random measures, which is able to generate partitions whose growth of the clusters sizes is almost surely sublinear. Along with this result, we provide the asymptotic behaviour of the number of clusters and of the proportion of clusters of a given size. Sequential Monte Carlo algorithms are derived for inference and we provide an illustration of the fit of the model on a movie review dataset.