SMILES-based optimal descriptors: QSAR modeling of carcinogenicity by balance of correlations with ideal slopes

Article ID	Journal	Published Year	Pages	File Type
1394745	European Journal of Medicinal Chemistry	2010	7 Pages	PDF

Abstract

Optimal descriptors which are calculated using the simplified molecular input line entry system (SMILES) were utilized to build quantitative structure–activity relationships (QSAR) of carcinogenicity (log TD50). Three schemes of the modeling have been examined: 1. The most traditional “classic” training-test system, i.e., models are built with training set and validated with external test set; 2. The correlation balance, i.e., models are built with preliminary estimation of the predictability of the model with the calibration set (this set plays a role of preliminary test set); and 3. The extended correlation balance that takes into account the slopes of regression lines in plots experimental versus predicted values of carcinogenicity (in ideal, these slopes should be similar). It has been shown that the extended correlation balance with the ideal slopes gives most robust prediction of carcinogenicity for external test set. These models have been built by Monte Carlo method for three splits into subtraining set, calibration set, and test set. The number of the N-nitroso groups (i.e., R1–N(R2)–NO) in a molecular system has been examined as an additional descriptor.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords

QSAR Optimal descriptor Carcinogenicity