A Classification Model for the Leiden Proteomics Competition

Huub C. J. Hoefsloot, University of Amsterdam
Suzanne Smit, University of Amsterdam
Age K. Smilde, University of Amsterdam

Abstract

A strategy is presented to build a discrimination model in proteomics studies. The model is built using cross-validation. This cross-validation step can simply be combined with a variable selection method, called rank products. The strategy is especially suitable for the low-samples-to-variables-ratio (undersampling) case, as is often encountered in proteomics and metabolomics studies. As a classification method, Principal Component Discriminant Analysis is used; however, the methodology can be used with any classifier. A data set containing serum samples from breast cancer patients and healthy controls is analysed. Double cross-validation shows that the sensitivity of the model is 82% and the specificity 86%. Potential putative biomarkers are identified using the variable selection method. In each cross-validation loop a classification model is built. The final classification uses a majority voting scheme from the ensemble classifier.

Submitted: January 18, 2008 · Accepted: January 26, 2008 · Published: February 19, 2008

Recommended Citation

Hoefsloot, Huub C. J.; Smit, Suzanne; and Smilde, Age K. (2008) "A Classification Model for the Leiden Proteomics Competition," Statistical Applications in Genetics and Molecular Biology: Vol. 7 : Iss. 2, Article 8.
DOI: 10.2202/1544-6115.1351
Available at: http://www.bepress.com/sagmb/vol7/iss2/art8

 
 
 
 

ISSN: 1544-6115 ©1999-2009 The Berkeley Electronic Press™ All rights reserved.

To submit, subscribe, recommend this journal to your library, or sign up for email alerts, please visit: http://www.bepress.com/sagmb