Article ID Journal Published Year Pages File Type
4419933 Ecotoxicology and Environmental Safety 2014 8 Pages PDF
Abstract

•Seven fingerprints and four machine learning methods were applied to predict EDCs.•Binary classification models were obtained with high predictive accuracy.•Phenol, trifluoromethyl, annelated rings etc. were identified as substructure alertsof EDCs.

Rapidly and correctly identifying endocrine-disrupting chemicals (EDCs) is an important issue in environmental risk assessment. Major EDCs are associated with the androgen receptor (AR) and oestrogen receptors (ERs). Because of the high cost and time-consuming nature of experimental tests, in silico methods are valuable alternative tools for the identification of EDCs. In this study, a large dataset related to EDCs was constructed. Each molecule was represented with seven fingerprints, and computational models were subsequently developed to predict AR and ER binders via machine learning methods including k-nearest neighbour (kNN), C4.5 decision tree (C4.5 DT), naïve Bayes (NB), and support vector machine (SVM) algorithms. The best model for predicting AR binders was PubChem Fingerprint-SVM, which exhibited an accuracy of 0.84. For ER binders, the best method was Extended Fingerprint-SVM with an accuracy of 0.79. Moreover, several representative substructure alerts for characterizing EDCs, such as phenol, trifluoromethyl, and annelated rings, were identified using the combination of information gain and substructure frequency analysis. Our study involved a systematic computational assessment of EDCs related to AR and ERs, and provides significant information on the structural characteristics of these chemicals, which are a great help in identifying EDCs.

Related Topics
Life Sciences Environmental Science Environmental Chemistry
Authors
, , , , , ,