Title: Projection pursuit in high dimensions
Authors: Peter Bickel - UC Berkeley (United States)
Gil Kur - Weizmann Institute of Science (Israel)
Boaz Nadler - Weizmann Institute of Science (Israel) [presenting]
Abstract: Projection pursuit is a classical exploratory technique to detecting interesting low dimensional structure in multivariate data. Motivated by contemporary applications, we study its properties in high dimensional settings. Specifically, we consider projection pursuit on structure-less Gaussian data with identity covariance, as both dimension $p$ and sample size $n$ tend to infinity, with $p/n$ tending to a constant $c$. Our main results are that: (i) if $c=\infty$, there exist projections whose corresponding empirical cdf can approximate any arbitrary distribution; (ii) if $0<c<\infty$, not all limiting distributions are possible. Yet, depending on the value of $c$ various non-Gaussian distributions may still be approximated. In contrast, if we restrict to sparse projections, involving only few of the $p$ variables, then asymptotically all empirical cdfs are Gaussian; and (iii) if $c=0$, then asymptotically all projections are Gaussian. Some of these results extend to mean centered sub-Gaussian data and to projections into $k$ dimensions. Hence, in small $n$, large $p$ settings, unless sparsity is enforced and regardless of the chosen projection index, projection pursuit, may detect apparent structure that has no statistical significance. Fundamental limitations are revealed on the ability to detect non-Gaussian signals in high dimensional data, in particular via independent component analysis (ICA) and related non-Gaussian component analysis.