Two-Stage Model-Based Clustering for Liquid Chromatography Mass Spectrometry Data Analysis
Abstract
Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. This experimental technology is producing high-throughput data which is inherently noisy and may contain various errors. Mathematical processing can help in removing them.
In this paper we focus on the peak alignment problem in LC-MS spectra. As an alternative to heuristic approaches to the problem, we propose a mathematically sound method which exploits a model-based clustering. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate and compare different classes of models. We analyze the results in terms of statistically significant biomarkers that can be identified after the alignment of spectra. The study was conducted on a dataset of plasma samples of colorectal cancer patients and healthy donors.
Submitted: June 6, 2007 · Accepted: January 9, 2009 · Published: February 12, 2009
Recommended Citation
Łuksza, Marta; Kluge, Bogusław; Ostrowski, Jerzy; Karczmarski, Jakub; and Gambin, Anna
(2009)
"Two-Stage Model-Based Clustering for Liquid Chromatography Mass Spectrometry Data Analysis,"
Statistical Applications in Genetics and Molecular Biology:
Vol. 8
:
Iss.
1, Article 15.
DOI: 10.2202/1544-6115.1308
Available at: http://www.bepress.com/sagmb/vol8/iss1/art15
