Title: Comparing classification methods and their generalisability on antibody sequences
Authors: Lutecia Servius - King's College London (United Kingdom) [presenting]
Davide Pigoli - King's College London (United Kingdom)
Joseph Ng - King (United Kingdom)
Franca Fraternali - Kings College London (United Kingdom)
Abstract: Class Switch Recombination (CSR) is a biological process where antibodies change isotopes to adapt their function. The mechanism of CSR is not well understood, but high throughput sampling of antibody sequences from human samples offers an opportunity to build data-driven models to understand its determinants. The performance of logistic regression (LR), LASSO logistic regression (LASSO) and random forest (RF) classifiers are compared on Respiratory Syncytial Virus (RSV) and hospitalised COVID 19 patient antibody sequence data sets. A method of data processing will be presented that seeks to isolate the signal in the data and reduce noise as well a testing procedure to target model generalisability. These analyses suggest that there is a signal in the antibody sequences that indicates CSR thus, results should be generalisable.