Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding
-
- Sandrine Dudoit, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Sunduz Keles, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Annette M. Molinaro, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Sandra E. Sinisi, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Siew Leng Teng, Division of Biostatistics, School of Public Health, University of California, Berkeley
-
Download the Paper
Forward to a colleague
- Published 2005 in G. Piatetsky-Shapiro and P. Tamayo (eds.), Microarray Data Mining, Special Issue of SIGKDD Explorations, Vol. 5, No. 2, p. 56-68.
- Abstract:
- Current statistical inference problems in genomic data analysis involve
parameter estimation for high-dimensional multivariate distributions, with
typically unknown and intricate correlation patterns among variables.
Addressing these inference questions satisfactorily requires: (i) an
intensive and thorough search of the parameter space to generate good
candidate estimators, (ii) an approach for selecting an optimal estimator
among these candidates, and (iii) a method for reliably assessing the
performance of the resulting estimator.
We propose a unified loss-based methodology for estimator construction,
selection, and performance assessment with cross-validation.
In this approach, the parameter of interest is defined as the risk
minimizer for a suitable loss function and candidate estimators are
generated using this (or possibly another) loss function.
Cross-validation is applied to select an optimal estimator among the
candidates and to assess the overall performance of the resulting
estimator.
This general estimation framework encompasses a number of problems which
have traditionally been treated separately in the statistical literature,
including multivariate outcome prediction and density estimation based on
either uncensored or censored data.
This article provides an overview of the methodology and describes its
application to two problems in genomic data analysis: the prediction of
biological and clinical outcomes (possibly censored) using microarray gene
expression measures and the identification of regulatory motifs (i.e.,
transcription factor binding sites) in DNA sequences.
- Subject Area:
- Human Genetics, Microarrays, Multivariate Analysis, Statistical Theory and Methods, Survival Analysis
- Suggested Citation:
- Sandrine Dudoit, Mark J. van der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, and Siew Leng Teng,
"Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding"
(December 2003).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 137.
http://www.bepress.com/ucbbiostat/paper137