Article ID Journal Published Year Pages File Type
1168352 Analytica Chimica Acta 2009 9 Pages PDF
Abstract

The major problem associated with application of principal component regression (PCR) in QSAR studies is that this model extracts the eigenvectors solely from the matrix of descriptors, which might not have essentially good relationship with the biological activity. This article describes a novel segmentation approach to PCR (SPCAR), in which the descriptors are firstly segmented to different blocks and then principal component analysis (PCA) is applied on each segment to extract significant principal components (PCs). In this way, the PCs having useful and redundant information are separated. A linear regression analysis based on stepwise selection of variables is then employed to connect a relationship between the informative extracted PCs and biological activity. The proposed method was first applied to model the aqueous toxicity of aliphatic compounds. The effect of the number of segments on the prediction ability of the method was investigated. Finally, a correlation analysis was achieved to identify those descriptors having significant contribution in the selected PCs and in aqueous toxicity. The proposed method was further validated by the analysis of Selwood data set consisting of 31 compounds and 53 descriptors. A comparison between the conventional PCR algorithm and SPCAR reveals the superiority of the latter. For external prediction set, SPCAR represented all requirements to be considered as predicted model whereas PCR did not. In addition, a comparison was made between the models obtained by SPCAR and those reported previously.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, ,