Article ID Journal Published Year Pages File Type
429434 Journal of Computational Science 2012 12 Pages PDF
Abstract

Advanced statistical techniques and data mining methods have been recognized as a powerful support for mass spectrometry (MS) data analysis. Particularly, due to its unsupervised learning nature, data clustering methods have attracted increasing interest for exploring, identifying, and discriminating pathological cases from MS clinical samples. Supporting biomarker discovery in protein profiles has drawn special attention from biologists and clinicians. However, the huge amount of information contained in a single sample, that is, the high-dimensionality of MS data makes the effective identification of biomarkers a challenging problem.In this paper, we present a data mining approach for the analysis of MS data, in which the mining phase is developed as a task of clustering of MS data. Under the natural assumption of modeling MS data as time series, we propose a new representation model of MS data which allows for significantly reducing the high-dimensionality of such data, while preserving the relevant features. Besides the reduction of high-dimensionality (which typically affects effectiveness and efficiency of computational methods), the proposed representation model of MS data also alleviates the critical task of preprocessing the raw spectra in the whole process of MS data analysis. We evaluated our MS data clustering approach to publicly available proteomic datasets, and experimental results have shown the effectiveness of the proposed approach that can be used to aid clinicians in studying and formulating diagnosis of pathological states.

► Advanced statistics and data mining have been recognized as a powerful support for mass spectrometry (MS) data analysis. ► We present a data mining approach for the analysis of MS data. ► We propose a new representation model of MS data that reduces the high-dimensionality of such data, while preserving the relevant features. ► The proposed model also alleviates the critical task of preprocessing the raw spectra. ► Experimental results show that our approach can be used to aid clinicians in studying and formulating diagnosis of pathological states.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , , ,