Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- Paired and Unpaired Comparisons and Clustering with Gene Expression Data
-
- Jennifer F. Bryan, Dept. of Statistics & Biotechnology Lab, University of British Columbia
- Katherine S. Pollard, Division of Biostatisics, School of Public Health, University of California, Berkeley
- Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, Berkeley
- Article comments:
- Published in Statistica Sinica, 12(1)87-110, 2002.
- The full text of this version of the working paper is not currently available online.
- Abstract:
- We have previously described a statistical framework for using gene expression data from cDNA
microarrays to select meaningful subsets of genes and to place genes into clusters (van der Laan
and Bryan, 2001). In this paper we extend this methodolgy to the setting in which expression data
is collected on a common set of p genes from either two observations within a subject (paired) or on
subjects from two subpopulations (unpaired). We present simulation results that illustrate important
issues encountered with cluster analysis in gene expression data. In particular, we see that
sampling variability of the covariance structure and the presence of unrelated genes can have
a strong impact on clustering algorithms and measures of cluster strength. We discuss ways
to address this issue, including the application of a hybrid clustering method which incorporates both
partitioning and collapsing steps. The hybrid methodology is illustrated on a cancer cell line data
set with two types of cancer. We also present a method for selecting significantly differently
expressed genes using a null distribution. Finally, we present theoretical results relating to
sample size and consistency in this setting.
- Subject Area:
- Computational Biology/Bioinformatics, Microarrays, Multivariate Analysis, Statistical Theory and Methods
- Suggested Citation:
- Jennifer F. Bryan, Katherine S. Pollard, and Mark J. van der Laan,
"Paired and Unpaired Comparisons and Clustering with Gene Expression Data"
(June 2001).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 95.
http://www.bepress.com/ucbbiostat/paper95