کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
407603 678158 2013 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Towards enhancing centroid classifier for text classification—A border-instance approach
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Towards enhancing centroid classifier for text classification—A border-instance approach
چکیده انگلیسی

Text classification/categorization (TC) is to assign new unlabeled natural language documents to the predefined thematic categories. Centroid-based classifier (CC) has been widely used for TC because of its simplicity and efficiency. However, it has also been long criticized for its relatively low classification accuracy compared with state-of-the-art classifiers such as support vector machines (SVMs). In this paper, we find that for CC using only border instances rather than all instances to construct centroid vectors can obtain higher generalization accuracy. Along this line, we propose Border-Instance-based Iteratively Adjusted Centroid Classifier (IACC_BI), which relies on the border instances found by some routines, e.g. 1-Nearest-and-1-Furthest-Neighbors strategy, to construct centroid vectors for CC. IACC_BI then iteratively adjusts the initial centroid vectors according to the misclassified training instances. Our extensive experiments on 11 real-world text corpora demonstrate that IACC_BI improves the performance of centroid-based classifiers greatly and obtains classification accuracy competitive to the well-known SVMs, while at significantly lower computational costs.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 101, 4 February 2013, Pages 299–308
نویسندگان
, , , , ,