کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515202 866968 2007 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Contextual feature selection for text classification
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Contextual feature selection for text classification
چکیده انگلیسی

We present a simple approach for the classification of “noisy” documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for call for tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are conducted on our in-house collection as well as on the 4-Universities data set, Reuters 21578 and 20 Newsgroups. We find a significant improvement on our collection and the 4-Universities data set (10.9% and 4.1%, respectively). Although the best results are obtained by combining bigrams and named entities, the impact of the latter is not found to be significant.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 43, Issue 2, March 2007, Pages 344–352
نویسندگان
, ,