Title: A new noise-resisting feature-based clustering quality evaluation approach scaling from low to high dimensional data
Authors: Jean-Charles Lamirel - LORIA (France) [presenting]
Abstract: The main concern is the optimal model selection in clustering. New quality indexes based on feature maximization are presented for that purpose. Feature maximization is an efficient alternative approach for feature selection in high dimensional spaces to usual measures like Chi-square, vector-based measures using Euclidean distance, correlation or information gain. The behavior of the new feature maximization based indexes is compared with a wide range of usual quality indexes, and with large set of alternative indexes as well, on different kinds of real life datasets constituted from low to high dimensional data for which ground truth is available. This comparison highlights the better accuracy and stability of the new indexes on these datasets, their efficiency from low to high dimensional range and their high tolerance to noise. Additional experiments are done on real life high dimensional textual data issued from a bibliographic database for which ground truth is unavailable. Experiments highlight that the accuracy and stability of these new indexes allow to efficiently manage time-based diachronic analysis. Conversely, usual indexes do not fit the requirements for this task. The proposed indexes are tested with hard clustering but their straightforward adaptation for soft clustering is finally presented.