کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
383706 660830 2013 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Comparison of text feature selection policies and using an adaptive framework
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Comparison of text feature selection policies and using an adaptive framework
چکیده انگلیسی

Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.


► A comprehensive analysis of feature selection metrics is given.
► New feature selection metrics are introduced.
► Adaptive keyword selection method is proposed.
► Local and global feature selection performances are compared.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 12, 15 September 2013, Pages 4871–4886
نویسندگان
, ,