Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data
-
-
Download the Paper
Forward to a colleague
- Article comments:
- Published 2005 in J. Statistical Planning and Inference, 125, pp. 85-100.
- Abstract:
- We define a general statistical framework for multiple hypothesis testing
and show that the correct null distribution for the test statistics is
obtained by projecting the true distribution of the test statistics onto
the space of mean zero distributions. For common choices of test
statistics (based on an asymptotically linear parameter estimator), this
distribution is asymptotically multivariate normal with mean zero and the
covariance of the vector influence curve for the parameter estimator. This
test statistic null distribution can be estimated by applying the
non-parametric or parametric bootstrap to correctly centered test
statistics. We prove that this bootstrap estimated null distribution
provides asymptotic control of most type I error rates. We show that
obtaining a test statistic null distribution from a data null distribution,
e.g. projecting the data generating distribution onto the space of
all distributions satisfying the complete null), only provides the correct
test statistic null distribution if the covariance of the vector influence
curve is the same under the data null distribution as under the true data
distribution. This condition is a weak version of the subset pivotality
condition. We show that our multiple testing methodology
controlling type I error is equivalent to constructing an error-specific
confidence region for the true parameter and checking if it contains the
hypothesized value. We also study the two sample problem and show that the
permutation distribution produces an asymptotically correct null
distribution if (i) the sample sizes are equal or (ii) the populations
have the same covariance structure. We include a discussion of the
application of multiple testing to gene expression data, where the
dimension typically far exceeds the sample size. An analysis of a cancer
gene expression data set illustrates the methodology.
- Subject Area:
- Computation, Statistical Theory and Methods
- Suggested Citation:
- Katherine S. Pollard and Mark J. van der Laan,
"Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data"
(June 2003).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 121.
http://www.bepress.com/ucbbiostat/paper121
- Previous Versions:
- Click a date to download that version.
December 02, 2002