CMStatistics 2021: Start Registration
View Submission - CMStatistics
B1017
Title: A prior based on allelic partitions for record linkage applications Authors:  Brenda Betancourt - NORC at the University of Chicago (United States) [presenting]
Abstract: In database management, record linkage aims to identify multiple records that correspond to the same individual. This task can be treated as a clustering problem, in which a latent entity is associated with one or more noisy database records. However, in contrast to traditional clustering applications, a large number of clusters with a few observations per cluster is expected in this context. We introduce a new class of prior distributions based on allelic partitions that is especially suited for the small cluster setting of record linkage. We also introduce a set of novel microclustering conditions in order to impose further constraints on the cluster sizes a priori. We evaluate the performance of our proposed class of priors using simulated data and official statistics data sets, and show that our models provide competitive results compared to state-of-the-art microclustering models in the record linkage literature.