CMStatistics 2020: Start Registration
View Submission - CMStatistics
B1003
Title: Machine learning approaches to predicting bacterial infection region of origin from DNA data Authors:  Jordan Taylor - University of Bath (United Kingdom) [presenting]
Abstract: Given that an individual has contracted a virus, it is of interest to predict corresponding meta-data from the counts of DNA sub-sequences. Such meta-data target sought is the virus origin where the current technique in the field is to build the phylogenetic tree using the sub-sequences, and by comparing a similarity measure between points, one can infer which region of the world ones virus strain originated from. Instead of building this computationally expensive phylogenetic tree, we instead treat this as a supervised learning problem. We use machine learning techniques to approximate the functional relationship between the observed sequences and the corresponding region location. The problem is then a classification problem with sparse unstructured observations and an imbalanced corresponding set of target region locations. We present initial results using a mix of non-parametric statistical techniques and machine learning methods.