Article ID Journal Published Year Pages File Type
8145642 Infrared Physics & Technology 2018 11 Pages PDF
Abstract
Traditional identification methods of coal origin have the drawbacks of complex operation, samples damage and environmental pollution. Near infrared spectroscopy is a new method which is used to solve the problems effectively. However, the coal samples spectra had the features of high dimension, redundancy and noise. Also the data set was small and imbalanced. Therefore, this study chose Random Forest (RF) algorithm as the basic modeling algorithm. Besides, the K-means algorithm was introduced to improve the Synthetic Minority Oversampling Technique (SMOTE) to overcome imbalanced data set. A comparison of the Support Vector Machine (SVM) model, the RF model and the improved RF model indicated that the improved RF model reached an overall accuracy of 97.92%, a G-mean value of 0.9696, and an average voting rate of 83.09%. These results were 6.25%, 7.03%, 6.94% higher than the counterparts of RF model respectively. Simultaneously, they were 8.34% and 5.86% higher than SVM model in accuracy and G-mean. The results suggested that the improved RF model produced reliable accuracy, validity and stability. Its results were conformed to the analysis of the coal-forming factors. Consequently, the algorithm is applicable to identify the geographic origin of coal rapidly.
Related Topics
Physical Sciences and Engineering Physics and Astronomy Atomic and Molecular Physics, and Optics
Authors
, , , ,