Title: Cluster validation: How to think and what to do
Authors: Christian Hennig - UCL (United Kingdom) [presenting]
Abstract: Cluster analysis is about finding groups in data. There are many cluster analysis methods and on most datasets clusterings from different methods will not agree. Cluster validation concerns the evaluation of the quality of a clustering. This is often used for comparing different clusterings on a dataset, stemming from different methods or with different parameters such as the number of clusters. An overview will be given of techniques for cluster validation, including visualisation methods, methods for assessing stability if a clustering, tests, validity indexes and some new measurements of different aspects of cluster validity. A discussion will be made on the issue of what the ``true clusters'' are that we want to find and how this depends on the specific application and the aims and concepts of the researcher, so that these can be connected to specific techniques for cluster validation. In the literature, the problem of cluster validation is often not well defined and there is a focus on automatic methods without providing much understanding of the specific circumstances in which they work (or not). Some insight into these issues will be provided.