کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
403552 677265 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
TESC: An approach to TExt classification using Semi-supervised Clustering
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
TESC: An approach to TExt classification using Semi-supervised Clustering
چکیده انگلیسی

This paper proposes an approach called TESC (TExt classification using Semi-supervised Clustering) to improve text classification. The basic idea is to regard one category of texts from one or more than one components. Thus, we use clustering to identify the components in text collection. In clustering process, TESC makes use of labeled texts to capture silhouettes of text clusters and unlabeled texts to adapt its centroids. The category of each text cluster is labeled by the label of texts in it. When a new unlabeled text is incoming, we measure its similarity with the text clusters and give its label with that of the nearest text clusters. Experiments on Reuters-21578 and TanCorp V1.0 text collection demonstrate that, in text classification, TESC outperforms Support Vector Machines (SVMs) and back propagation neural network (BPNN), and produces comparable performance to naïve Bayes with EM (Expectation Maximization) however with lower computation complexity.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 75, February 2015, Pages 152–160
نویسندگان
, , ,