کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
403552 | 677265 | 2015 | 9 صفحه PDF | دانلود رایگان |
This paper proposes an approach called TESC (TExt classification using Semi-supervised Clustering) to improve text classification. The basic idea is to regard one category of texts from one or more than one components. Thus, we use clustering to identify the components in text collection. In clustering process, TESC makes use of labeled texts to capture silhouettes of text clusters and unlabeled texts to adapt its centroids. The category of each text cluster is labeled by the label of texts in it. When a new unlabeled text is incoming, we measure its similarity with the text clusters and give its label with that of the nearest text clusters. Experiments on Reuters-21578 and TanCorp V1.0 text collection demonstrate that, in text classification, TESC outperforms Support Vector Machines (SVMs) and back propagation neural network (BPNN), and produces comparable performance to naïve Bayes with EM (Expectation Maximization) however with lower computation complexity.
Journal: Knowledge-Based Systems - Volume 75, February 2015, Pages 152–160