CMStatistics 2017: Start Registration
View Submission - CMStatistics
Title: Learning large-scale Bayesian networks Authors:  Qing Zhou - UCLA (United States) [presenting]
Bryon Aragam - UCLA (United States)
Arash Amini - UCLA (United States)
Jiaying Gu - UCLA (United States)
Abstract: Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets typically have upwards of thousands, sometimes tens or hundreds of thousands, of variables and far fewer samples. To meet this challenge, we develop theory and algorithms for learning the structure of large Bayesian networks, represented by directed acyclic graphs (DAGs). Our theoretical results establish support recovery guarantees and deviation bounds for a family of penalized least-squares estimators under concave regularization, including many popular regularizers, such as the MCP, SCAD, $\ell_{0}$ and $\ell_{1}$. The proof relies on interpreting a DAG as a recursive linear structural equation model, which reduces the estimation problem to a series of neighborhood regressions. We apply these results to study the statistical properties of score-based DAG estimators, learning causal DAGs, and inferring conditional independence relations via graphical models. Our algorithms are implemented in a new open-source R package, sparsebn, available on CRAN. This package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions, and places a premium on scalability.