CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1491
Title: Mixed data distances Authors:  Michel van de Velden - Erasmus University Rotterdam (Netherlands) [presenting]
Carlo Cavicchia - Erasmus University Rotterdam (Netherlands)
Alfonso Iodice D Enza - University of Naples Federico II (Italy)
Angelos Markos - Democritus University Of Thrace (Greece)
Abstract: In many statistical methods, distance plays an important role. For instance, data visualization, classification and clustering methods require quantification of distances among objects. How to define such distances depends on the nature of the data and the problem at hand. For the distance between numerical variables, in particular in multivariate contexts, there exist many definitions that depend on the actual observed differences between values. For categorical data, defining a distance is more complex as the nature of such data prohibits straightforward arithmetic operations. However, various specific measures have been introduced that can be used to quantify observed differences in categorical data. For mixed data, aggregate distances can be constructed by taking a (weighted) sum of the distances. We consider several definitions for mixed variable distances and show how to implement them efficiently.