CMStatistics 2022: Start Registration
View Submission - CMStatistics
B0880
Title: Mining for equitable health: Assessing the impact of missing data in electronic health records Authors:  Emily Getzen - University of Pennsylvania (United States) [presenting]
Lyle Ungar - University of Pennsylvania (United States)
Danielle Mowery - University of Pennyslvania (United States)
Xiaoqian Jiang - UTHealth School of Biomedical Informatics (United States)
Qi Long - University of Pennsylvania (United States)
Abstract: Electronic health records (EHRs) contain multiple years of health information to be leveraged for disease detection and treatment evaluation. However, they do not have standardized formatting, and can present significant analytical challenges-- they contain multi-scale data from heterogeneous domains and include both structured and unstructured data. Data for individual patients are collected at irregular time intervals and with varying frequencies. In addition to the analytical challenges, EHRs can reflect inequity-- patients belonging to different groups will have differing amounts of data. The consequence is that the data for marginalized groups may be less informative due to more fragmented care, which can be viewed as a missing data problem. For EHR data in this complex form, there is currently no framework for introducing missing values. There has also been little to no work in assessing the impact of missing data in EHRs. We simulate realistic missing data scenarios in EHRs to adequately assess their impact on predictive modeling. We incorporate the use of a medical knowledge graph to capture dependencies between medical events to create a more realistic missing data framework. In an intensive care unit setting, we found that missing data have a greater negative impact on the performance of disease prediction models in groups that tend to have less access to healthcare.