Article ID Journal Published Year Pages File Type
1394745 European Journal of Medicinal Chemistry 2010 7 Pages PDF
Abstract

Optimal descriptors which are calculated using the simplified molecular input line entry system (SMILES) were utilized to build quantitative structure–activity relationships (QSAR) of carcinogenicity (log TD50). Three schemes of the modeling have been examined: 1. The most traditional “classic” training-test system, i.e., models are built with training set and validated with external test set; 2. The correlation balance, i.e., models are built with preliminary estimation of the predictability of the model with the calibration set (this set plays a role of preliminary test set); and 3. The extended correlation balance that takes into account the slopes of regression lines in plots experimental versus predicted values of carcinogenicity (in ideal, these slopes should be similar). It has been shown that the extended correlation balance with the ideal slopes gives most robust prediction of carcinogenicity for external test set. These models have been built by Monte Carlo method for three splits into subtraining set, calibration set, and test set. The number of the N-nitroso groups (i.e., R1–N(R2)–NO) in a molecular system has been examined as an additional descriptor.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Related Topics
Physical Sciences and Engineering Chemistry Organic Chemistry
Authors
, , ,