Title: Mixture modeling of mixed type data for clustering
Authors: Eman Alamer - McMaster University (Canada)
Paul McNicholas - McMaster University (Canada)
Michael Gallaugher - Baylor University (United States) [presenting]
Abstract: Clustering, also known as unsupervised classification, forms the foundation of machine learning techniques and is used to find underlying group structures in data. There are many well-established model-based techniques to analyze either categorical or continuous data in the clustering paradigm; however, there is a relative paucity of work for mixed-type data, especially mixed data where the continuous variables exhibit skewness and/or heavy tails. We consider two different avenues. This first is to consider the case where the continuous variables exhibit skewness and heavy tails. In this case, we consider combining a latent variable model with the skew-t distribution for modelling the continuous variables. The second is to consider outlier detection for mixed-type data by combining a latent variable model with the contaminated normal distribution. Both simulated and real data will be used for illustration.