کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
384811 660855 2012 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Comparison of term frequency and document frequency based feature selection metrics in text categorization
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Comparison of term frequency and document frequency based feature selection metrics in text categorization
چکیده انگلیسی

Text categorization plays an important role in applications where information is filtered, monitored, personalized, categorized, organized or searched. Feature selection remains as an effective and efficient technique in text categorization. Feature selection metrics are commonly based on term frequency or document frequency of a word. We focus on relative importance of these frequencies for feature selection metrics. The document frequency based metrics of discriminative power measure and GINI index were examined with term frequency for this purpose. The metrics were compared and analyzed on Reuters 21,578 dataset. Experimental results revealed that the term frequency based metrics may be useful especially for smaller feature sets. Two characteristics of term frequency based metrics were observed by analyzing the scatter of features among classes and the rate at which information in data was covered. These characteristics may contribute toward their superior performance for smaller feature sets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 39, Issue 5, April 2012, Pages 4760–4768
نویسندگان
, ,