Comparison of text feature selection policies and using an adaptive framework

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
383706	660830	2013	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Feature selection - انتخاب ویژگی Document categorization - طبقه بندی اسناد Support vector machines - ماشین بردار پشتیبانی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Comparison of text feature selection policies and using an adaptive framework

چکیده انگلیسی

Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.

► A comprehensive analysis of feature selection metrics is given.
► New feature selection metrics are introduced.
► Adaptive keyword selection method is proposed.
► Local and global feature selection performances are compared.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 12, 15 September 2013, Pages 4871–4886

نویسندگان

Şerafettin Taşcı, Tunga Güngör,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Comparison of text feature selection policies and using an adaptive framework

دسترسی سریع

ارتباط

English Website