Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
9590556 | Journal of Molecular Structure: THEOCHEM | 2005 | 9 Pages |
Abstract
A set of molecular descriptors, including electronic descriptors, topological descriptors, geometric descriptors and molecular shape indices, are calculated to characterize the structural and physicochemical properties for 94 chemical compounds: 42 antifungal active and 52 inactive. Support Vector Machine (SVM) classification method is employed to model the discrimination between the antifungal activity and inactivity for these compounds. Leave-one-out (LOO) cross-validation method is used to optimize the SVM model and a genetic algorithm is used in variable selection, this reduces the number of molecular descriptors from 67 to 30. Five-fold cross-validation method and an independent evaluation set are used to test SVM model. The training sets are effectively and evenly chosen in the descriptor space by clustering based on their chemical similarity, and both of the test methods give consistent results with the LOO method. Compared to the LOO method, 5-fold cross-validation method or the independent test method requires much less time for the SVM model optimization and it is impossible to do a LOO cross-validation for a very large data set. Our work suggests that a proper choice of training set for 5-fold cross-validation method or the independent test method can give consistent results with the LOO method. Comparison of the results by SVM method and those by other statistical classification methods, for example k-nearest neighbor (k-NN) and C4.5 decision tree that use the same pre-selected molecular descriptors, is also conducted. Our investigation indicates the potential of SVM in facilitating the prediction of antifungal activity.
Related Topics
Physical Sciences and Engineering
Chemistry
Physical and Theoretical Chemistry
Authors
Shi-Wei Chen, Ze-Rong Li, Xiang-Yuan Li,