COMPSTAT 2018: Start Registration
View Submission - COMPSTAT2018
A0238
Title: Author name identification using a Dirichlet-multinomial regression topic model Authors:  Tomokazu Fujino - Fukuoka Women University (Japan) [presenting]
Keisuke Honda - The Institute of Statistical Mathematics (Japan)
Hiroka Hamada - The Institute of Statistical Mathematics (Japan)
Abstract: A new framework is proposed for extracting a complete list of the articles written by researchers who belong to a specific research or educational institute from an academic document database such as the Web of Science. The framework is based on Latent Dirichlet Allocation (LDA), which is a topic model. To improve the framework we use various techniques and indices, such as synonym retrieval, inverse document frequency and Dirichlet-multinomial Regression (DMR). By using DMR, it is possible to reflect observed features of the articles such as author's affiliation in the topic distribution derived by LDA. A numerical example is presented to illustrate the framework, and we will discuss how much improvement is possible by using DMR topic model.