CMStatistics 2016: Start Registration
View Submission - CMStatistics
B1585
Title: Using latent Dirichlet allocation on a large commercial scale Authors:  Erik Mathiesen - Octavia.ai (United Kingdom) [presenting]
Abstract: Latent Dirichlet Allocation (LDA) is not just an academic artefact but a very powerful tool in practical applications. We present and share some of our insights and experiences from working with large scale LDA models in a commercial setting. Our applications involve large and diverse corpora that require frequent retraining and adjustments of hyper parameters in a time sensitive manner. We will address a range of challenges encountered, such as: How do you efficiently train and calibrate your model? How do you determine your hyper parameters? How many of them should you have? How often should you update them? We aim to provide a practical guide to commercial usage of LDA with predominant use of heuristics.