کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
402602 | 676968 | 2015 | 13 صفحه PDF | دانلود رایگان |
• A novel MK–MCOC–FP approach is proposed for predicting active compounds in bioassay.
• Multi-kernel method reduces dimensionality and gains the interpretable classifier.
• A fuzzification method using class median minimizes the effect of outliers.
• The class-imbalanced penalty factors tune overfitting and underfitting.
• Our proposed classifier obtains better performance in flexibility and accuracy.
Nowadays it is important to develop effective computational methods for accurately identifying and predicting biological activity in the virtual screening of bioassay data so as to speed up the process of drug development. Among these methods, multi-criteria optimization classifier (MCOC) is a classifier which can find a trade-off between the overlapping degree of different classes and the total distance from input points to the decision hyperplane. The former should be minimized while the latter should be maximized. Then a decision function is derived from training data and this function is subsequently used to predict the class label of an unseen instance. However, due to outliers, anomalies, highly imbalanced classes, high dimension, nonlinear separability and other uncertainties in data, MCOC and other methods often give the poor predictive performance. In this paper, we introduce a new fuzzy contribution to each input point based on class median, by defining the new row and column kernel functions the linear combination of different feature kernels to replace the single kernel function in the kernel-induced feature space and penalty factors to imbalanced classes, thus a novel multi-kernel multi-criteria optimization classifier with fuzzification and penalty factors (MK–MCOC–FP) is proposed and the effects of the aforementioned problems are significantly reduced. The experimental results of predicting active compounds in the virtual screening and comparison with linear and quadratic MCOCs, support vector machines (SVM), fuzzy SVM and neural network, the conclusions show that MK–MCOC–FP evidently increased the ability of resisting noise interference, the predictive accuracy of highly class-imbalanced bioassay data, the separation of active compounds and inactive compounds, the interpretability of importance or contributions of different features to classification, the efficiency of classification with feature selection or dimensionality reduction for high-dimensional data, and the generalization of predicting the biological activity of new compounds.
Journal: Knowledge-Based Systems - Volume 89, November 2015, Pages 301–313