An efficient approach for compound identification based on the frequency features of mass spectra

Article ID	Journal	Published Year	Pages	File Type
1180513	Chemometrics and Intelligent Laboratory Systems	2015	7 Pages	PDF

Abstract

•A nonzero feature-retention strategy is proposed to decrease the dimensionality.•A correlation-based filtering strategy is devised to improve the efficiency.•A two-stage similarity measure scheme is designed to reduce the computation burden.•The accuracy of the proposed method is competitive to the existing methods.•The computation time of the proposed method is far less than the existing methods.

Similarity-measure-based spectrum matching is an effective approach to chemical compound identification. When the sizes of both the query library and the reference library become increasingly large, most existing spectrum-matching methods encounter a seriously heavy computation burden. In this paper, an effective and efficient compound-identification approach is proposed based on the frequency features of mass spectra. Considering the sparsity of mass spectra, a nonzero feature-selection strategy is proposed to decrease the feature dimensionality of mass spectra. To further improve its efficiency, a correlation-based filtering strategy is presented to select the most correlated reference spectra in order to create a reduced reference library. Based on the decreased features and the reduced reference library, the frequency-feature-based composite similarity measures are computed to estimate the chemical abstracts service (CAS) registry numbers of the mass spectra blue in a query library. Due to the reduction in both the feature dimensionality and the reference library, the computation time of the proposed method is only about 6%–11% of that of the existing methods, while the identification performance remains sufficiently competitive. Experimental results demonstrate the feasibility and efficiency of the proposed method.

Keywords