کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4416903 1307805 2006 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The importance of outlier detection and training set selection for reliable environmental QSAR predictions
موضوعات مرتبط
علوم زیستی و بیوفناوری علوم محیط زیست شیمی زیست محیطی
پیش نمایش صفحه اول مقاله
The importance of outlier detection and training set selection for reliable environmental QSAR predictions
چکیده انگلیسی

Empirical QSAR models are only valid in the domain they were trained and validated. Application of the model to substances outside the domain of the model can lead to grossly erroneous predictions. Partial least squares (PLS) regression provides tools for prediction diagnostics that can be used to decide whether or not a substance is within the model domain, i.e. if the model prediction can be trusted. QSAR models for four different environmental end-points are used to demonstrate the importance of appropriate training set selection and how the reliability of QSAR predictions can be increased by outlier diagnostics. All models showed consistent results; test set prediction errors were very similar in magnitude to training set estimation errors when prediction outlier diagnostics were used to detect and remove outliers in the prediction data. Test set prediction errors for substances classified as outliers were much larger. The difference in the number of outliers between models with a randomly and systematically selected training illustrates well the need of representative training data.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Chemosphere - Volume 63, Issue 1, March 2006, Pages 99–108
نویسندگان
, , , ,