COMPSTAT 2018: Start Registration
View Submission - COMPSTAT2018
Title: Measuring the diffusion of innovations with paragraph vector topic models Authors:  David Lenz - Justus-Liebig University Giessen (Germany) [presenting]
Peter Winker - University of Giessen (Germany)
Abstract: Topic modeling became an intensively researched area lately, mainly due to the ever-increasing availability of huge digital text information and the improvements in methods to analyze these datasets. In natural language processing, topic modeling describes a set of methods to extract the latent topics from a collection of documents. Several new methods have recently been proposed to improve the topic generation process. However, examination of the generated topics is still mostly based on unsatisfactory practices, for example by looking only at the list of most frequent words for a topic. Our contribution is threefold: 1) We present a topic modeling approach based on neural embeddings and Gaussian mixture modeling, which is shown to generate coherent and meaningful topics. 2) We propose a novel ``topic report'' based on dimensionality reduction techniques and model generated document vector features which helps to easily identify topics and significantly reduces the required mental overhead. 3) Lastly, we demonstrate on a technology related newsticker corpus how our approach could be used by economists to tackle economic problems, for example to measure the diffusion of innovations.