Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
1394745 | European Journal of Medicinal Chemistry | 2010 | 7 Pages |
Optimal descriptors which are calculated using the simplified molecular input line entry system (SMILES) were utilized to build quantitative structure–activity relationships (QSAR) of carcinogenicity (log TD50). Three schemes of the modeling have been examined: 1. The most traditional “classic” training-test system, i.e., models are built with training set and validated with external test set; 2. The correlation balance, i.e., models are built with preliminary estimation of the predictability of the model with the calibration set (this set plays a role of preliminary test set); and 3. The extended correlation balance that takes into account the slopes of regression lines in plots experimental versus predicted values of carcinogenicity (in ideal, these slopes should be similar). It has been shown that the extended correlation balance with the ideal slopes gives most robust prediction of carcinogenicity for external test set. These models have been built by Monte Carlo method for three splits into subtraining set, calibration set, and test set. The number of the N-nitroso groups (i.e., R1–N(R2)–NO) in a molecular system has been examined as an additional descriptor.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide