Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- Asymptotics of Cross-Validated Risk Estimation in Estimator Selection and Performance Assessment
-
-
Download the Paper
Forward to a colleague
- Article comments:
- Published July 2005 in Statistical Methodology, 2(2): 131-154.
- Abstract:
- Risk estimation is an important statistical question for the purposes of
selecting a good estimator (i.e., model selection) and assessing its
performance (i.e., estimating generalization error).
This article introduces a general framework for cross-validation and
derives distributional properties of cross-validated risk estimators in
the context of estimator selection and performance assessment.
Arbitrary classes of estimators are considered, including density
estimators and predictors for both continuous and polychotomous outcomes.
Results are provided for general full data loss functions (e.g., absolute
and squared error, indicator, negative log density).
A broad definition of cross-validation is used in order to cover
leave-one-out cross-validation, V-fold cross-validation, Monte Carlo
cross-validation, and bootstrap procedures.
For estimator selection, finite sample risk bounds are derived and applied
to establish the asymptotic optimality of cross-validation, in the sense
that a selector based on a cross-validated risk estimator performs
asymptotically as well as an optimal oracle selector based on the risk
under the true, unknown data generating distribution.
The asymptotic results are derived under the assumption that the size of
the validation sets converges to infinity and hence do not cover
leave-one-out cross-validation.
For performance assessment, cross-validated risk estimators are shown to
be consistent and asymptotically linear for the risk under the true data
generating distribution and confidence intervals are derived for this
unknown risk.
Unlike previously published results, the theorems derived in this and our
related articles apply to general data generating distributions, loss
functions (i.e., parameters), estimators, and cross-validation procedures.
- Subject Area:
- Statistical Theory and Methods
- Suggested Citation:
- Sandrine Dudoit and Mark J. van der Laan,
"Asymptotics of Cross-Validated Risk Estimation in Estimator Selection and Performance Assessment"
(February 2003).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 126.
http://www.bepress.com/ucbbiostat/paper126
- Previous Versions:
- Click a date to download that version.
February 05, 2003