CMStatistics 2017: Start Registration
View Submission - CMStatistics
Title: Clustering airbnb reviews Authors:  Yang Tang - McMaster University (Canada) [presenting]
Paul McNicholas - McMaster University (Canada)
Abstract: A clustering approach is developed for Boston Airbnb reviews, in the English language, collected since 2008. This approach is based on a mixture of latent variables model, which provides an appealing framework for handling clustered binary data. In the broader context of social science applications (e.g., voting data, web reviews, and survey data), extremely large numbers of variables rule out the use of a mixture of latent trait models. A penalized mixture of latent traits approach is developed to reduce the number of parameters and identify variables that are not informative for clustering. The introduction of component-specific rate parameters avoids the over-penalization that can occur when inferring a shared rate parameter on clustered data. A variational expectation-maximization algorithm is developed and provides closed-form estimates for model parameters; this is in contrast to an intensive search over the rate parameters via a model selection criterion. This approach is important for a whole class of applications, but the focus herein is the Boston Airbnb reviews data.