Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
1181250 | Chemometrics and Intelligent Laboratory Systems | 2016 | 7 Pages |
•Machine learning approach can be used for standard-free LC-MS based plant recognition•Real software applications require databases with entries for hundreds of species with appropriate dataset for each plant.•Large training set for each individual plant is needed to avoid overfitting of model•Retention times of compounds give valuable qualitative information but cannot be used unprocessed
Herbal medicines are vigorously marketed, but poorly regulated. Analysis methodology for this field is still forming. One particular analytical task is confirmation of plant species identity for medicinal plants used as ingredients. In this work, machine learning approach has been implemented for LC–MS plant species identification. Samples for 36 plant species have been analyzed. Peak data (m/z, abundance) from respective samples have been used for development of classification algorithms. Namely, logistic regression (LR), support vector machine (SVM) and random forest (RF) techniques were used. For most of used machine learning algorithms, classification accuracy of 95% higher were obtained on cross-validation dataset. Now, massive training datasets are needed for full-scale application of this approach.