CMStatistics 2022: Start Registration
View Submission - CMStatistics
B0973
Title: A generalization of the latent Dirichlet allocation Authors:  Roberto Ascari - University of Milano-Bicocca (Italy) [presenting]
Alice Giampino - University of Milano-Bicocca (Italy)
Abstract: Over recent years, text modeling techniques have been employed in several applications, including the detection of latent topics in text documents. A widespread statistical tool for topic modeling is the Latent Dirichlet Allocation (LDA), which allows for a document representation in terms of topic composition. A well-known limitation of the LDA is related to the stiffness of the Dirichlet prior imposed on the topic distributions. The aim is to perform a preliminary study of the flexible Dirichlet (FD) as an alternative prior. The latter is a generalization of the Dirichlet distribution allowing for a finite mixture structure. The introduction of additional parameters ensures more flexibility, still maintaining the model interpretability, as well as conjugacy to the multinomial model. The latter property allows for a Collapsed Gibbs Sampling-based estimation procedure. The generalization of the LDA based on the FD distribution is illustrated via an application to a real dataset.