Workshop SDS: Registration
View Submission - SDS2022
A0196
Title: A novel fuzzy spectral clustering approach for text data Authors:  Irene Cozzolino - Università La Sapienza (Italy) [presenting]
Maria Brigida Ferraro - Sapienza University of Rome (Italy)
Peter Winker - University of Giessen (Germany)
Abstract: Spectral clustering methodologies have been widely used in text classification tasks due to their good performances and solid theoretic foundations which do not assume any prearranged structure in the data. However, very few contributions have been proposed for unsupervised classification techniques. We focus on the employment of the spectral clustering algorithm when analysing unlabelled text documents. A crucial point with this method consists in the construction of an adequate similarity matrix to use as input of the algorithm. This aspect has motivated us to introduce a new similarity measure for text data based on a weighted combination of both sequence and set similarities, in order to also capture the inherent sequential nature of text files that can be seen as an ordered sequence of words. Furthermore, a new fuzzy version of spectral clustering has been introduced to use in combination with the proposed similarity. The newly introduced document clustering algorithm has been evaluated by means of benchmark and real data sets.