COMPSTAT 2022: Start Registration
View Submission - COMPSTAT2022
Title: Evaluation of supervised learning algorithms for binary classification in sentiment analysis. Authors:  Diana Laura Aguirre Capistran - Universidad Veracruzana (Mexico) [presenting]
Emmanuel Morales-Garcia - Universidad Veracruzana (Mexico)
Candy Obdulia Sosa Jimenez - Universidad Veracruzana (Mexico)
Maribel Carmona Garcia - Universidad Veracruzana (Mexico)
Abstract: The last decade has acquired great importance in the use of the classification of unstructured data, in order to study certain patterns of behavior. The focus is on classifying words extracted from Twitter to evaluate the positive (1) and negative (0) spectra about what users tweet. To know exactly the correct classification, the use of supervised algorithms was implemented, through binary classification. For the text data processing, the vectorization of the original matrix was developed in order to structure it to binary, to analyze the data techniques such as SVM, naive Bayes, decision trees and logistic regression were used. The statistical software R Project was used to develop the techniques. The result obtained from the correct classification percentage was 88.4\% for SVM, 88.5\% Bayes, 89\% decision trees and 89.3\% logistic regression. In conclusion, the machine learning algorithms worked correctly for the classification of sentiment analysis (positive and negative), although the SVM classifier was the one that had lower performance, however, it may work more aptly for a different sample size.