A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515434	867013	2012	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Feature selection - انتخاب ویژگی Text categorization - طبقه بندی متن Support vector machines - ماشین بردار پشتیبانی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization

چکیده انگلیسی

The feature selection, which can reduce the dimensionality of vector space without sacrificing the performance of the classifier, is widely used in text categorization. In this paper, we proposed a new feature selection algorithm, named CMFS, which comprehensively measures the significance of a term both in inter-category and intra-category. We evaluated CMFS on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naïve Bayes (NB) and Support Vector Machines (SVMs). The experimental results, comparing CMFS with six well-known feature selection algorithms, show that the proposed method CMFS is significantly superior to Information Gain (IG), Chi statistic (CHI), Document Frequency (DF), Orthogonal Centroid Feature Selection (OCFS) and DIA association factor (DIA) when Naïve Bayes classifier is used and significantly outperforms IG, DF, OCFS and DIA when Support Vector Machines are used.

► The term is comprehensively measured both in inter-category and intra-category.
► We compared the proposed method with six well-known feature selection algorithms.
► The proposed algorithm can significantly improve the performance of classifiers.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 48, Issue 4, July 2012, Pages 741–754

نویسندگان

Jieming Yang, Yuanning Liu, Xiaodong Zhu, Zhen Liu, Xiaoxu Zhang,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization

دسترسی سریع

ارتباط

English Website