Discriminative models using molecular descriptors for predicting increased serum ALT levels in repeated-dose toxicity studies of rats

Article ID	Journal	Published Year	Pages	File Type
8376750	Computational Toxicology	2018	7 Pages	PDF

Abstract

The demand for alternatives to animal experiment-based assessment is increasing. Alternatives for assessing repeated-dose toxicity, however, have yet to be developed. Our aim was to develop discriminative models for predicting an increase in serum ALT levels in rats, using molecular descriptors. In vivo data for rats in the training data sets were obtained using the Hazard Evaluation Support System Integrated Platform (HESS), and molecular descriptors were calculated using DRAGON 6. We developed the discriminative models based on logistic regression models; however, there were two statistical difficulties to be overcome: (i) the number of molecular descriptors was much greater than the number of compounds; (ii) the training data sets were imbalanced. In order to overcome these difficulties, the k-medoids method was employed in the case of the first difficulty, and the Synthetic Minority Over-sampling Technique (SMOTE) algorithm in the case of the second. One of the resulting models showed predictive capability, with sensitivity of 0.783, specificity of 0.745, and concordance of 0.750. Our results show that a statistical learning approach can create a discriminative model with high predictive capability using only information on the molecular descriptors of chemicals.

Keywords

Feature selection Molecular descriptors Hepatotoxicity Imbalanced data set Discriminative models