B0930
Title: MDGMM and MIAMI: Towards flexible and interpretable models for mixed data
Authors: Robin Fuchs - CNRS (France) [presenting]
Denys Pommeret - ISFA (Lyon 1) & I2M (France)
Samuel Stocksieker - Institut de Mathematiques de Marseille (France)
Cinzia Viroli - University of Bologna (Italy)
Abstract: Modeling mixed data remains a challenging task due to the heterogeneous nature of the variables (continuous, ordinal, categorical, binary, count). Taking advantage of the recently introduced Deep Gaussian Mixture Models and Generalized Linear Latent Variable Models, we propose two models able to cluster and generate synthetic mixed data. The resulting multi-layer models, namely the Mixed Deep Gaussian Mixture Model (MDGMM) and MIxed data Augmentation MIxture (MIAMI), explicitly handle the different variable types and learn a continuous and low-dimensional latent representation of the data. This latent representation captures the dependence structure in the data and is used to perform clustering and generate synthetic data. The models are completed by visual diagnostic tools, architecture selection methods, and dedicated initialization procedures. Benchmarking the MDGMM and MIAMI with state-of-the-art models on several UCI datasets, the approaches have proven to deliver solid performance, model flexibility, and result interpretability.