CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1420
Title: Portfolio construction using robust NLP incorporating noisy social media text Authors:  Jennifer Zou - Harvard University (United States) [presenting]
Roy Welsch - Massachusetts Institute of Technology (United States)
Frank Xing - Nanyang Technological University (Singapore)
Abstract: Social media data provides valuable insight into retail investors' market perceptions in close to real-time; however, the signals can be noisy due to misspellings, abbreviations, and other representational differences. Furthermore, natural language processing (NLP) models for handling such texts have been shown to suffer from a number of robustness issues. We present a method for obtaining more robust semantic vector embeddings from social media (Twitter) data by training on a combination of clean and artificially generated noisy texts. We then demonstrate the improved performance of portfolios constructed using these robust estimates in simulation.