Article ID Journal Published Year Pages File Type
7562487 Chemometrics and Intelligent Laboratory Systems 2017 29 Pages PDF
Abstract
In recent years, there have been growing concerns about quality evaluation of predictions of developed quantitative structure-activity relationship (QSAR) models. Well-defined applicability domain (AD) is very crucial in the validation of QSAR models as stated in the third principle of Organization for Economic Co-operation and Development (OECD). In this study, a new perspective on defining AD of model based on population analysis (PA) strategy, including model population analysis (MPA) and approach population analysis (APA), was proposed. MPA employed classical AD approaches to define AD with a vast amount of sub-datasets derived from training set. On the basis of MPA, the classical AD approaches could distinguish part of the samples that cannot be distinguished by full training samples. APA was then used to get a union of all results generated by the used AD approaches to give a consensus list of samples as falling outside the AD. In order to investigate the performance of PA strategy in defining AD with the classical AD approaches, two QSAR datasets were used. The results show that implementing PA strategy can assist three classical AD approaches to distinguish the additional samples that cannot be distinguished using full training dataset. When excluding the additional samples, the root mean square error of prediction of test set decreased, suggesting that PA strategy has a potential to distinguish the samples that cannot be reliably predicted.
Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , , , , , , ,