Title: Clustering with lat semiparametric mixture models
Authors: Wen Zhou - Colorado State University (United States) [presenting]
Lyuou Zhang - Colorado State University (United States)
Hui Zou - University of Minnesota (United States)
Lulu Wang - Gilead Sciences (United States)
Abstract: Model-based clustering is one of the fundamental statistical approaches in unsupervised learning and has a wide range of applications. While modeling the clusters by a mixture distribution is concise and easy to implement, the traditional distributional assumptions such as the Gaussianity or other parametric forms are stringent in practice and not always realistic to verify. Existing efforts on relaxing such assumptions, on the other hand, are mostly algorithmic without any guarantees on the performance. We introduce a novel latent semiparametric mixture model to facilitate clustering data without imposing any direct distributional assumptions on data. Specifically, the model only assumes that the observations are generated from some unknown monotone transformations of latent variables governed by a Gaussian mixture. The nontrivial identifiability of the proposed model due to the unknown transformations is carefully studied. For implementation, we introduce an alternating maximization procedure based on the EM algorithm and scrupulously investigate its convergence using finite-sample analysis. An interesting transition phenomenon on the convergence of the proposed algorithm, which is due to the presence of the unknown transformations, is explored and guides the execution of the algorithm. This observation also leads to the rate of convergence for the excess mis-clustering error of our method compared to the traditional results.