Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- The Cross-Validated Adaptive Epsilon-Net Estimator
-
- Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Sandrine Dudoit, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Aad W. van der Vaart, Dept. of Mathematics, Vrije Universitat, Amsterdam
-
Download the Paper
Forward to a colleague
- Abstract:
- Suppose that we observe a sample of independent and identically
distributed realizations of a
random variable. Assume that the parameter of interest
can be defined as the minimizer, over a suitably defined parameter space,
of the expectation (with respect to the distribution of the random
variable) of a particular (loss) function of a candidate parameter value
and the random variable.
Examples of commonly used loss functions are the squared error loss
function
in regression and the negative log-density loss function in density
estimation.
Minimizing the empirical risk (i.e., the empirical mean of the
loss function) over the entire parameter space typically results in
ill-defined or too variable estimators of the parameter of interest (i.e.,
the risk minimizer for the true data generating distribution).
In this article, we propose a cross-validated epsilon-net
estimation methodology that covers a broad class of estimation problems,
including multivariate outcome prediction and multivariate density
estimation.
An epsilon-net sieve of a subspace of the parameter space is defined
as a collection of finite sets of points, the epsilon-nets indexed by
epsilon, which
approximate the subspace up till a resolution of epsilon.
Given a collection of subspaces of the parameter space,
one constructs an epsilon-net sieve for each of the subspaces.
For each choice of subspace and each value of the resolution epsilon,
one defines
a candidate estimator as the minimizer of the empirical
risk over the corresponding epsilon-net.
The cross-validated epsilon-net estimator is then defined as
the candidate estimator corresponding to the choice of subspace and
epsilon-value minimizing the cross-validated empirical risk.
We derive a finite sample inequality which proves that the proposed
estimator
achieves the adaptive optimal minimax rate of convergence, where the
adaptivity
is achieved by considering epsilon-net sieves for various subspaces.
We also address the implementation of the cross-validated epsilon-net
estimation procedure.
In the context of a linear regression model, we present results of a
preliminary simulation study comparing the cross-validated epsilon-net
estimator to the cross-validated L^1-penalized least squares estimator
(LASSO) and the least angle regression estimator (LARS).
Finally, we discuss generalizations of the proposed estimation methodology
to censored data structures.
- Subject Area:
- Statistical Theory and Methods, Survival Analysis
- Suggested Citation:
- Mark J. van der Laan, Sandrine Dudoit, and Aad W. van der Vaart,
"The Cross-Validated Adaptive Epsilon-Net Estimator"
(February 2004).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 142.
http://www.bepress.com/ucbbiostat/paper142