View Submission - HiTECCoDES2025
A0191
Title: Sampling uncertainty of research topics Authors:  Anna Staszewska-Bystrova - University of Lodz (Poland) [presenting]
Viktoriia Naboka-Krell - Justus Liebig Unversity of Giessen (Germany)
Victor Bystrov - University of Lodz (Poland)
Peter Winker - University of Giessen (Germany)
Abstract: In latent topic models, estimated topic-word and document-topic probabilities are typically reported with no indication of sampling uncertainty. The lack of additional information on sampling uncertainty might result in misleading conclusions regarding topic structure and prevalence. We propose to measure sampling uncertainty using a bootstrap method and describe how uncertainty can be captured by novel types of word clouds reporting topic-word probability estimates and by confidence bands designed for reporting time series estimates of topic weights. The application of the new measures and methods is illustrated with an empirical example involving conference abstracts. The results indicate varying robustness of estimated research topics with respect to resampling of documents from the same text collection. In particular, some estimated topics may not persist across resampled corpora, and the estimation precision of topic-word probabilities within the same topic can exhibit significant variation. Similar uncertainty is associated with topic prevalence over time. The proposed confidence bands for dynamic topic weights can be used to make inferences about structural changes in research topic trends.