CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Unstructured textual data and composite indicators construction Authors:  Camilla Salvatore - University of Milano-Bicocca (Italy) [presenting]
Annamaria Bianchi - University of Bergamo (Italy)
Silvia Biffignandi - University of Bergamo (Italy)
Abstract: This paper presents a novel approach for constructing indicators using social media data in order to augment traditional data-based indicators and discusses ways to modify the input based on social media data.Topic modelling is considered in order to identify the proportion of text related to the phenomenon under consideration. These proportions are the input for the indicator construction. However, topic modelling is computationally expensive and might be difficult to interpret for final data users. A more natural approach for the construction of composite indicators is the use of dictionary-based methods. Based on the results of topic modelling (words that are mostly associated with a dimension), we propose to develop a context-specific dictionary that can be used to perform the same analysis more efficiently. One of the main advantages of the dictionary approach is that it can be easily implemented, interpreted, and updated regularly by experts. In this paper, we propose different methods for constructing dictionaries, considering words, stems, and a list of words augmented by word embeddings. A sensitivity analysis can be performed to determine the stability of the indicator in light of the different approaches.The specific empirical application focuses on measuring corporate social responsibility (CSR) and an original Twitter indicator is developed.