Discrimination of raw and processed Dipsacus asperoides by near infrared spectroscopy combined with least squares-support vector machine and random forests

Article ID	Journal	Published Year	Pages	File Type
1234643	Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy	2012	7 Pages	PDF

Abstract

Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.

Graphical abstractThe NIR profiles of raw and processed Dipsacus asperoides were very complex and similar, so chemometric tools were used for the discrimination.Figure optionsDownload full-size imageDownload as PowerPoint slideHighlights► The aim of this study is to discriminate between raw and processed herbs based on NIR spectroscopy. ► Support vector machine, random forests and linear discriminant analysis are used for discrimination. ► The three chemometric tools achieve 100% classification accuracy for testing set. ► Support vector machine produces better classification results, and RF is very fast in the training and making predictions.

Keywords

Random forests Classification Near Infrared spectroscopy Least squares-support vector machine