CMStatistics 2021: Start Registration
View Submission - CMStatistics
Title: Fuzzy spectral clustering for document data sets Authors:  Irene Cozzolino - Università La Sapienza (Italy) [presenting]
Maria Brigida Ferraro - Sapienza University of Rome (Italy)
Peter Winker - University of Giessen (Germany)
Abstract: In recent years, spectral clustering methods have been successfully applied in the field of text classification. The success of these methods is largely based on their solid theoretical foundations which do not make any assumption on the global structure of the data. Despite their good performance in text classification, little has been done in the field of clustering. In this regard, a crucial point for every spectral clustering algorithm is the construction of a similarity matrix to use as input of the algorithm, which should well describe the intrinsic nature of the data. To enhance the clustering performance, and motivated by the inherent sequential nature of text data, a new similarity measure is introduced, which is obtained as a weighted combination of sequence and set similarities. Indeed, the only use of sequence similarities ignores the non-sequential part which might be similar in content too. Moreover, we introduce a novel fuzzy version of spectral clustering for text data to use in combination with the proposed similarity matrix. The adequacy of the new document clustering method is evaluated by means of benchmark and real data sets.