Title: Using clustered heat maps to improve the selection of a clustering algorithm and its parametrization
Authors: Leonardo Feltrin - Laurentian University (Canada) [presenting]
Martina Bertelli - Geoness Consulting (Australia)
Abstract: Cluster analysis is a discipline that aims at finding groups of similar entities in a data set. One of the outstanding problems of cluster analysis is the selection of an appropriate granularity and clustering algorithm. Many solutions attempt to evaluate the clustering quality to select an adequate cluster number (K) and an optimal classifier. Proposed strategies attempt to locate automatically a set of threshold values to optimize the position of the decision boundaries, causing often unwanted information loss. Interpretive solutions based on external validation measures of cluster quality can be misleading since they provide quality indices that depend upon the validation strategy and are difficult to interpret, limiting the capacity of evaluating the partitioning process. More exhaustive information is needed to permit embedding of domain knowledge to improve the classification outcome. A dynamic, computational workflow was designed to obtain n-dimensional visualizations of the clustering process (based on Clustered Heat Maps with custom annotations). Synthetic data experiments show that this workflow facilitates the selection of an appropriate granularity level, making more explicit the results of multiple clustering algorithms and relative parametrizations. This approach exposes some of the weaknesses of external cluster validation methods.