CMStatistics 2018: Start Registration
View Submission - CMStatistics
Title: Distributed learning from multiple EHR databases: Contextual embedding models for medical events Authors:  Qi Long - University of Pennsylvania (United States) [presenting]
Ziyi Li - Emory University (United States)
Xiaoqian Jiang - University of Texas Health Science Center at Houston (United States)
Abstract: Electronic health records (EHRs) data offer great promises in personalized medicine. However, EHRs data also present analytical challenges due to their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data across multiple institutions/sites may be infeasible. A recent work uses contextual embedding models and successfully builds one predictive model for more than seventy common diagnoses. Although the existing model can achieve a relatively high predictive accuracy, it cannot build global models without sharing data among sites. We proposed a novel distributed method to learn from multiple databases and build predictive models: Distributed Noise Contrastive Estimation (Distributed NCE). We also extend the proposed method with differential privacy to obtain reliable data privacy protections. Our numerical studies demonstrate that the proposed method can build predictive models in a distributed fashion with privacy protection and the resulting models achieve comparable prediction accuracy compared with existing methods that use pooled data across all sites.