Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- Supervised Detection of Regulatory Motifs in DNA Sequences
-
- Sunduz Keles, Division of Biostatistics,
School of Public Health, University of California, Berkeley
- Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Sandrine Dudoit, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Biao Xing, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Michael B. Eisen, Dept. of Molecular and Cell Biology, UC Berkeley and Life Sciences Division, Ernest Orlando Lawrence Berkeley National Lab
-
Download the Paper
Forward to a colleague
- Published 2003 in
Statistical Applications in Genetics and Molecular Biology,
Vol 2, No 1, Article 5
- Abstract:
- Identification of transcription factor binding sites
(regulatory motifs) is a major interest in contemporary biology.
We propose a new likelihood based method, COMODE, for
identifying structural motifs in DNA sequences.
Commonly used methods (e.g. MEME, Gibbs sampler) model
binding sites as families of sequences described by
a position weight matrix (PWM) and identify PWMs that
maximize the likelihood of observed sequence data
under a simple multinomial mixture model. This model
assumes that the positions of the PWM correspond to
independent multinomial distributions with four cell
probabilities. We address supervising the search for DNA
binding sites using the information derived from
structural characteristics of protein-DNA interactions. We
extend the simple multinomial mixture model by
incorporating constraints on the information content profiles or
on specific parameters of the motif PWMs. The
parameters of this extended model are estimated by maximum
likelihood using a nonlinear constraint optimization
method. Likelihood-based cross-validation is used to select
model parameters such as motif width and constraint
type. The performance of COMODE is compared with
existing motif detection methods on simulated data
that incorporate real motif examples from Saccharomyces
cerevisiae. The proposed method is especially
effective when the motif of interest appears as a weak signal in the
data. Some of the transcription factor binding data of
Lee et al. (2002) were also analyzed using COMODE and
biologically verified sites were identified.
- Subject Area:
- Categorical Data Analysis, Human Genetics, Statistical Models, Statistical Theory and Methods
- Suggested Citation:
- Sunduz Keles, Mark J. van der Laan, Sandrine Dudoit, Biao Xing, and Michael B. Eisen,
"Supervised Detection of Regulatory Motifs in DNA Sequences"
(May 2003).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 131.
http://www.bepress.com/ucbbiostat/paper131