CMStatistics 2018: Start Registration
View Submission - CMStatistics
B0818
Title: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality Authors:  Jing Ma - Fred Hutch Cancer Research Center (United States) [presenting]
Linjun Zhang - University of Pennsylvania (United States)
Tony Cai - University of Pennsylvania (United States)
Abstract: Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. CHIME is presented, a procedure for clustering of high-dimensional Gaussian mixtures that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess mis-clustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is established. The technical tools developed for the high-dimensional setting can also be used to establish the optimality of the clustering of Gaussian mixtures in the conventional low-dimensional setting. The merit of CHIME is illustrated in both simulated and real data settings.