CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Learning low-dimensional nonlinear structures from high-dimensional noisy data: An integral operator approach Authors:  Rong Ma - Stanford University (United States) [presenting]
Xiucai Ding - UC Davis (United States)
Abstract: A kernel-spectral embedding algorithm is proposed for learning low-dimensional nonlinear structures from noisy and high-dimensional observations, where the datasets are assumed to be sampled from a nonlinear manifold model and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, for a general class of kernel functions, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and the sample size are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove the convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Our results hold even when the dimension of the manifold grows with the sample size. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various nonlinear manifolds in diverse applications.