کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515290 866979 2006 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Hierarchical document categorization with k-NN and concept-based thesauri
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Hierarchical document categorization with k-NN and concept-based thesauri
چکیده انگلیسی

In this paper, we propose a new algorithm, which incorporates the relationships of concept-based thesauri into the document categorization using the k-NN classifier (k-NN). k-NN is one of the most popular document categorization methods because it shows relatively good performance in spite of its simplicity. However, it significantly degrades precision when ambiguity arises, i.e., when there exist more than one candidate category to which a document can be assigned. To remedy the drawback, we employ concept-based thesauri in the categorization. Employing the thesaurus entails structuring categories into hierarchies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between categories. By referencing various relationships in the thesaurus corresponding to the structured categories, k-NN can be prominently improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relationships to reduce the ambiguity. Experimental results show that this method improves the precision of k-NN up to 13.86% without compromising its recall.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 42, Issue 2, March 2006, Pages 387–406
نویسندگان
, , ,