Title: Using Bayesian change-point Markov sampler in basecalling of nanopore signals
Authors: Sophia Shen - Macquarie University (Australia) [presenting]
Georgy Sofronov - Macquarie University (Australia)
Abstract: DNA sequencing is an important subdiscipline in bioinformatics for there are many crucial applications, such as in forensic to convict criminals or in medicine to diagnose diseases like cancer or devise personalised drug interventions. The latest DNA sequencing technologies using enzyme-based nanopores are capable of capturing long repetitive DNA structures frequently present in introns. The commercialisation of the portable nanopore sequencer MinION made DNA sequencing more accessible despite its high error rates. The process of translating raw (electrical) nanopore signals into genetic alphabets is called basecalling. Statistical methods such as hidden Markov models (HMM) and recurrent neural networks (RNN) have been employed to analyse DNA bases during basecalling. An alternative algorithm is considered based on the Gibbs sampler that allows transitions between different models. The change point framework is adopted as each base transition can be thought of as a change point in the nanopore signals. Our numerical study shows that the proposed Bayesian change point algorithm can identify the number of change points and their locations at the same time and its flexibility have the potential to make DNA base identification more robust during basecalling.