CMStatistics 2017: Start Registration
View Submission - CMStatistics
Title: ScAMP: Scalable automated metagenomics pipeline Authors:  Lesley Hoyles - Imperial College London (United Kingdom)
James Abbott - Imperial College London (United Kingdom)
Elaine Holmes - Imperial College London (United Kingdom)
Jeremy Nicholson - Imperial College London (United Kingdom)
Marc-Emmanuel Dumas - Imperial College London (United Kingdom)
Sarah Butcher - Imperial College London (United Kingdom)
Ekaterina Smirnova - Virginia Commonwealth University (United States) [presenting]
Abstract: An in-house pipeline was developed for the processing and analyses of sequence data generated from human microbiome studies. Quality analysis, trimming and filtering of sequence data allow reads associated with samples to be binned according to whether they represent human, prokaryotic, viral, parasite, fungal or plant DNA. Non-prokaryotic DNA can be assigned to species level on a presence/absence basis, allowing - for example - identification of dietary intake of plant-based foodstuffs and their derivatives. Prokaryotic DNA is subject to taxonomic analyses using MetaPhlAn2. After de novo assembly of sequence reads and gene prediction, a non-redundant catalogue of genes is built. From this catalogue, gene abundance and metagenomic species can be determined after normalization of data, as can microbial gene richness. Functional annotation of genes is achieved by mapping against KEGG proteins, InterProScan and CAZy. The pipeline was validated using previous data. Outputs from the pipeline allow development of tools for the integration of metagenomic and metabonomic data, moving metagenomic studies beyond determination of microbial gene richness and representation towards microbial-metabolite mapping.