Title: Classification based on dissimilarities
Authors: Beibei Yuan - Leiden University (Netherlands) [presenting]
Willem Heiser - Leiden University (Netherlands)
Mark De Rooij - Leiden University (Netherlands)
Abstract: The $\delta$-machine is introduced. This is a statistical learning tool for classification based on dissimilarities or distances, $\delta$, between inputs. The first step is to compute Euclidean distances between objects based on the predictor variables. Thereafter, we define varuous functions of the distances that produce (dis)similarity kernels. We distinguish four functions: the identity function, the squared function, the exponential decay function, and the Gaussian decay function. The (dis)similarity-kernels take the role as predictors in classification techniques. Classification decisions are based on the dissimilarity of objects to the selected exemplars or prototypes. This leads to nonlinear classification boundaries in the original predictor space. In a simulation study we compare the different dissimilarity-based logistic regressions using three types of artificial data. One with linear classification boundaries and two with nonlinear boundaries. Furthermore, we investigate the effect of noise predictor variables, the effect of sample size, and the effect of the number of predictors. The simulation study shows that overall three kernels perform very well (all but the squared Euclidean), and that these kernels are very flexible in the type of data they can handle.