CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: Statistical analysis of a hierarchical clustering algorithm with outliers Authors:  Audrey Poterie - Universite Bretagne Sud (France) [presenting]
Laurent Rouviere - Universite Rennes 2 - IRMAR (France)
Nicolas Klutchnikoff - Universite Rennes 2 (France)
Abstract: In unsupervised learning, the single linkage is a hierarchical clustering method which consists in recursively merging the two closest clusters in term of minimal distance. Even if this procedure has many interesting properties, it is well known that, due to the chaining problem, the procedure usually fails to identify clusters in the presence of outliers (observations that do not belong to any clusters). We propose a new version of this algorithm and we study its mathematical performances. In particular, we provide an oracle inequality which ensures that the proposed procedure is efficient under mild assumptions on the size of the clusters. The performances of our approach are also assessed through a simulation study involving various synthetic data sets and a comparison with some classical clustering algorithms is also presented.