Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- A Method to Identify Significant Clusters in Gene Expression Data
-
-
Download the Paper
Forward to a colleague
- Article comments:
- Published 2002 in Proceedings, SCI (World Multiconference on Systemics, Cybernetics and Informatics), V. II, 318-325.
- Abstract:
- Clustering algorithms have been widely applied to gene expression
data. For both hierarchical and partitioning clustering algorithms,
selecting the number of significant clusters is an important problem
and many methods have been proposed. Existing methods for selecting
the number of clusters tend to find only the global patterns in the
data (e.g.: the over and under expressed genes). We have noted the
need for a better method in the gene expression context, where small,
biologically meaningful clusters can be difficult to identify. In this
paper, we define a new criteria, Mean Split Silhouette (MSS), which is
a measure of cluster heterogeneity. We propose to choose the number of
clusters as the minimizer of MSS. In this way, the number of
significant clusters is defined as that which produces the most
homogeneous clusters. The power of this method compared to existing
methods is demonstrated on simulated microarray data. The minimum MSS
method is an example of a general approach that can be applied to any
clustering routine with any global criteria.
- Subject Area:
- Microarrays, Multivariate Analysis, Statistical Theory and Methods
- Suggested Citation:
- Katherine S. Pollard and Mark J. van der Laan,
"A Method to Identify Significant Clusters in Gene Expression Data"
(April 2002).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 107.
http://www.bepress.com/ucbbiostat/paper107