Article ID Journal Published Year Pages File Type
4950988 Journal of Computational Science 2017 22 Pages PDF
Abstract
The traditional KNN query is a kind of algorithm with good stability and accuracy performance. However, when the sample size is too large, the computational efficiency of the algorithm is affected greatly. Therefore, a kind of parallel MKNN text classification algorithm based on clustering center text series has been proposed. Firstly, the effective dimensionality reduction of similarity calculation amount of the algorithm is realized based on the clustering center, and the original large-scale document samples are replaced with a relatively small number of clustering sample centers to realize improvement of the KNN query process. Secondly, MapReduce parallel framework is used to meet real-time demand of large-scale text classification and calculation combined with features of text classification, and to effectively overcome slow speed of the KNN query process and ensure accuracy of text classification as higher as possible. Finally, the classification speed of proposed algorithm can be effectively improved under the premise of ensuring sufficient accuracy through comparison in experiment of text classification accuracy and algorithmic efficiency with the similar single-threaded algorithm.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
,