کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
5132095 | 1491478 | 2017 | 11 صفحه PDF | دانلود رایگان |

We examined the performance of several popular machine learning methods including multiple linear regression (MLR), support vector regression (SVR) and Gaussian process regression (GPR) for quantitative predictions of estrogen receptor (ER) binding affinity. Particular attention is devoted to compiling an accurate and precise dataset of 1589 experimental binding affinities (logRBA) from the Estrogenic Activity Database to train and validate the models. Issues related to accuracy/precision of experimental data, choice of binding affinity data measured on human ER or across species are addressed. The SVR and GPR models performed the best with root-mean-square-errors below 1 log unit that are comparable to experimental precision. We further examined the use of functional group analysis, nearest-neighbour distance and confidence interval estimates to flag large outliers. This work provides a test set of precise experimental data that may be used in future benchmarking studies of other machine learning models or free energy calculations.
96
Journal: Chemical Data Collections - Volumes 9â10, August 2017, Pages 114-124