Extracting homologous series from mass spectrometry data by projection on predefined vectors

Article ID	Journal	Published Year	Pages	File Type
1181491	Chemometrics and Intelligent Laboratory Systems	2012	8 Pages	PDF

Abstract

Multivariate statistical methods, such as Principal Component Analysis (PCA), have been used extensively over the past decades as tools for extracting significant information from complex data sets. As such they are very powerful and in combination with an understanding of underlying chemical principles, they have enabled researchers to develop useful models. A drawback with the methods is that they do not have the ability to incorporate any physical / chemical model of the system being studied during the statistical analysis. In this paper we present a method that can be used as a complement to traditional chemometric tools in finding patterns in mass spectrometry data. The method uses a pre-defined set of equally spaced sequences that are assumed to be present in the data. Allowing for some uncertainty in the peak locations due to the uncertainties for the measurement instrumentation, the measured spectra are then projected onto this set. It is shown that the resulting scores can be used to identify homologous series in measured mass spectra that differ significantly between different measured samples. As opposed to PCA, the loading vectors, in this case the pre-defined homologous series, are readily interpretable.

► A new model-based decomposition of mass spectrometry data into homologous series. ► The method provides fingerprinting and clustering performance similar to that of PCA. ► Underlying loading vectors are immediately interpretable in terms of underlying chemical compounds. ► An application example on a set of bio-oils is provided to demonstrate the principle.

Keywords

Fingerprint Principal components Bio oil Mass spectrometry Chemometrics