Title: Bayesian networks with missing data imputation enable exploratory analysis of causal complex biological relationships
Authors: Heather Cordell - Newcastle University (United Kingdom) [presenting]
Richard Howey - Newcastle University (United Kingdom)
Abstract: Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, particularly in complex scenarios with many measured variables. When there is missing data, the standard approach is to remove every individual with missing data before performing any Bayesian network analysis. This can be wasteful and undesirable when there are many individuals with missing data, perhaps with only one or a few variables missing, motivating the use of imputation. We present a new imputation method designed to increase the power to detect causal relationships, where the data may be a mixture of both discrete and continuous variables. This method uses a version of nearest neighbour imputation, whereby missing data from one individual is replaced with data from another individual, their nearest neighbour. For each individual with missing data, subsets of variables that can be used to find the nearest neighbour are chosen by bootstrapping the complete data to estimate a Bayesian network. We show that this approach leads to marked improvements in recall and precision, and we apply the approach to data from a recent study that investigated the causal relationship between methylation and gene expression in rheumatoid arthritis patients.