CMStatistics 2021: Start Registration
View Submission - CMStatistics
Title: On the detection of bots in online surveys Authors:  Carl Falk - McGill University (Canada) [presenting]
Michael John Ilagan - McGill University (Canada)
Abstract: Academic research via online data collection of survey responses through crowdsourcing platforms has become increasingly prevalent in the social sciences. However, the increased anonymity of participants coupled with monetary compensation may result in the contamination of such data by bots or random responders. While a number of outlier detection indices are often recommended to detect bots, their practical combined usage is hampered by the lack of recommendations for empirically derived cut-off values. We propose and compare four algorithms that could be used to classify bots in an unsupervised manner while leveraging such outlier detection indices. The basis of these algorithms relies on the assumptions that bots are exchangeable random vectors and that detection indices tend to separate humans and bots. Permutations are then used to derive an accompanying test and/or inform clustering techniques. In simulations, some studied techniques achieved about 90-95\% accuracy across conditions ranging from low bot contamination (5\%) to high bot contamination (95\%). Given that data collection often occurs in the presence of multi-item scales with Likert-type items, additional discussion focuses on 1) scale/study design conditions under which we would expect such algorithms to encounter difficulty; and 2) the potential for indices derived from psychometric models (e.g., based on item response theory) to be able to detect clusters of bots.