Article ID Journal Published Year Pages File Type
1181488 Chemometrics and Intelligent Laboratory Systems 2012 5 Pages PDF
Abstract

A kernel version of k-nearest neighbor algorithm (k-NN) has been developed to model the complex relationship between molecular descriptors and bioactivities of compounds. Kernel k-NN is to perform the original k-NN algorithm by mapping the training samples in the input space into a high-dimensional feature space. It can be easily constructed by calculating the distance between samples in the feature space, directly deriving from the simple calculation of the kernel used. The developed kernel k-NN is very flexible to deal with complex nonlinear relationship, more importantly; it can also conveniently cope with some non-vectorial data only by the definition of different kernels. The results obtained from several real SAR datasets indicated that the performance of kernel k-NN is comparable to support vector machine methods. It can be regarded as an alternative modeling technique for several chemical problems including the study of structure–activity relationship (SAR). The source codes implementing kernel k-NN in R language are freely available at http://code.google.com/p/kernelmethods/.

► A kernel version of k-NN has been developed. ► The performance of kernel k-NN is competitive to one by SVM. ► Kernel k-NN can cope with non-vectorial data such as string data etc. ► Weighted kernel k-NN was developed to allow the construction of ROC.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , , , , , ,