View Submission - COMPSTAT2022

A0489
**Title: **A general framework for implementing distance measures for categorical variables
**Authors: **Michel van de Velden - Erasmus University Rotterdam (Netherlands) **[presenting]**

Alfonso Iodice D Enza - University of Naples Federico II (Italy)

Angelos Markos - Democritus University Of Thrace (Greece)

Carlo Cavicchia - Erasmus University Rotterdam (Netherlands)

**Abstract: **In many statistical methods, distance plays an important role. For instance, data visualization, classification and clustering methods require quantification of distances among objects. How to define such distance depends on the nature of the data and/or problem at hand. For the distance between numerical variables, in particular in multivariate contexts, there exist many definitions that depend on the actual observed differences between values. It is worth underlining that often it is necessary to rescale the variables before computing the distances. Many distance functions exist for numerical variables. For categorical data, defining a distance is even more complex as the nature of such data prohibits straightforward arithmetic operations. Specific measures, therefore, need to be introduced that can be used to describe or study the structure and/or relationships in the categorical data. We introduce a general framework that allows an efficient and transparent implementation of the distance between categorical variables. We show that several existing distances (for example distance measures that incorporate association among variables) can be incorporated into the framework. Moreover, our framework quite naturally leads to the introduction of new distance formulations as well.

Alfonso Iodice D Enza - University of Naples Federico II (Italy)

Angelos Markos - Democritus University Of Thrace (Greece)

Carlo Cavicchia - Erasmus University Rotterdam (Netherlands)