Two-class support vector machine with new kernel function based on paths of features for predicting chemical activity

Article ID	Journal	Published Year	Pages	File Type
4944483	Information Sciences	2017	12 Pages	PDF

Abstract

Information and computer science fields such as machine learning and graph theory are implemented in chemoinformatics to discover the properties of chemical compounds. This paper presents a new algorithm based on the two-class support vector machine (SVM) model, which has new kernel functions for paths of features, enabling the prediction of chemical compound activity. Initially, we extract all paths of features (star subgraphs) with certain lengths, and we encode them depending on their structure in the graphs. Then, we use these codes to construct two relationship matrices between those paths. These matrices contain common and different sub-paths between paths of stars. The number of sub-paths/paths for each compound is passed to the proposed kernel functions in the two-class SVM to predict the activity of chemical compounds. The relationship matrices created by the proposed algorithm help to reduce the number of features, which improves prediction accuracy. We apply the proposed algorithm with and without feature selection using two benchmark datasets, specifically, the monoamine oxidase (MAO) dataset and the AIDS antiviral screen database of active compound dataset, which have 68 and 2000 chemical compounds, respectively. We perform comparative experiments for the proposed kernel functions and many other two-class SVM prediction methods, and the results before feature selection show prediction accuracies of 94% and 99.5% for MAO and AIDS, respectively. After selection, the prediction accuracies are 96% and 99.5% for MAO and AIDS, respectively.

Keywords

Chemoinformatics Graph kernel Activity prediction