کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
443491 692727 2012 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی تئوریک و عملی
پیش نمایش صفحه اول مقاله
In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner
چکیده انگلیسی

With an increasing need for the rapid and effective safety assessment of compounds in industrial and civil-use products, in silico toxicity exploration techniques provide an economic way for environmental hazard assessment. The previous in silico researches have developed many quantitative structure–activity relationships models to predict toxicity mechanisms for last decade. Most of these methods benefit from data analysis and machine learning techniques, which rely heavily on the characteristics of data sets. For Tetrahymena pyriformis toxicity data sets, there is a great technical challenge—data imbalance. The skewness of data class distribution would greatly deteriorate the prediction performance on rare classes. Most of the previous researches for phenol mechanisms of toxic action prediction did not consider this practical problem. In this work, we dealt with the problem by considering the difference between the two types of misclassifications. Random Forest learner was employed in cost-sensitive learning framework to construct prediction models based on selected molecular descriptors. In computational experiments, both the global and local models obtained appreciable overall prediction accuracies. Particularly, the performance on rare classes was indeed promoted. Moreover, for practical usage of these models, the balance of the two misclassifications can be adjusted by using different cost matrices according to the application goals.

Figure optionsDownload high-quality image (239 K)Download as PowerPoint slideHighlights
► We constructed QSAR models for predicting toxicity mechanisms.
► The results suggested Random Forest is a useful tool for mechanisms prediction.
► Random Forest failed to deal with the imbalanced data problem.
► Cost-sensitive learning improved the performance of Random Forest on rare classes.
► The two types of misclassifications were balanced by using cost matrices.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Molecular Graphics and Modelling - Volume 35, May 2012, Pages 21–27
نویسندگان
, , , ,