Title: Characterizing protein-DNA binding event subtypes in ChIP-exo data using read distribution shapes and DNA sequences
Authors: Naomi Yamada - Penn State University (United States) [presenting]
William Lai - Penn State University (United States)
Nina Farrell - The Broad Institute of Harvard and MIT (United States)
Franklin Pugh - Penn State University (United States)
Shaun Mahony - Penn State University (United States)
Abstract: Regulatory proteins associate with the genome either by directly binding cognate DNA motifs or via protein-protein interactions with other regulators. The ChIP-exo protocol precisely characterizes protein-DNA interactions by combining chromatin immunoprecipitation (ChIP) with 5 to 3 prime end exonuclease digestion. Since different regulatory complexes bind to DNA differently, analysis of ChIP-exo read distributions (curves generated by the read counts along the genome) should enable detection of multiple protein-DNA binding modes for a given regulatory protein. To systematically detect multiple protein-DNA interaction modes in a single ChIP-exo experiment, we introduce the ChIP-exo mixture model (ChExMix). ChExMix defines possible binding event subtypes by both clustering observed ChIP-exo read distribution shapes and performing targeted de novo motif discovery around the predicted binding events. ChExMix then uses an expectation maximization learning scheme to probabilistically model the genomic locations and subtype membership of binding events using both ChIP-exo read distributions and DNA sequence information. We demonstrate that ChExMix achieves accurate detection and classification of binding event subtypes using in silico mixed ChIP-exo data. We further demonstrate that ChExMix identifies cooperative binding interactions of key transcription factors in MCF-7 cells. Thus, ChExMix can effectively stratify ChIP-exo binding events into biologically meaningful subtypes.