Estimating Number of Clusters Based on a General Similarity Matrix with Application to Microarray Data

Shafagh Fallah, University of Toronto
David Tritchler, University Health Network, Toronto; University of Toronto; and SUNY at Buffalo
Joseph Beyene, Hospital for Sick Children Research Institute and University of Toronto

Abstract

Many clustering methods require that the number of clusters believed present in a given data set be specified a priori, and a number of methods for estimating the number of clusters have been developed. However, the selection of the number of clusters is well recognized as a difficult and open problem and there is a need for methods which can shed light on specific aspects of the data. This paper adopts a model for clustering based on a specific structure for a similarity matrix. Publicly available gene expression data sets are analyzed to illustrate the method and the performance of our method is assessed by simulation.

Submitted: October 15, 2006 · Accepted: July 4, 2008 · Published: August 2, 2008

Recommended Citation

Fallah, Shafagh; Tritchler, David; and Beyene, Joseph (2008) "Estimating Number of Clusters Based on a General Similarity Matrix with Application to Microarray Data," Statistical Applications in Genetics and Molecular Biology: Vol. 7 : Iss. 1, Article 24.
Available at: http://www.bepress.com/sagmb/vol7/iss1/art24

 
 
 
 

ISSN: 1544-6115 ©1999-2008 The Berkeley Electronic Press™ All rights reserved.

To submit, subscribe, recommend this journal to your library, or sign up for email alerts, please visit: http://www.bepress.com/sagmb