Using micro-documents for feature selection: The case of ordinal text classification

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
382698	660778	2013	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Feature selection - انتخاب ویژگی Ordinal regression - رگرسیون مرتبه Text classification - طبقه بندی متن Supervised learning - نظارت بر یادگیری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Using micro-documents for feature selection: The case of ordinal text classification

چکیده انگلیسی

Most popular feature selection methods for text classification such as information gain (also known as “mutual information”), chi-square, and odds ratio, are based on binary information indicating the presence/absence of the feature (or “term”) in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in the training document (term frequency). In order to overcome this drawback, when doing feature selection we logically break down each training document of length k into k training “micro-documents”, each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original feature selection methods based on binary information to be still straightforwardly applicable, and (b) making them sensitive to term frequency information. We study the impact of this strategy in the case of ordinal text classification, a type of text classification dealing with classes lying on an ordinal scale, and recently made popular by applications in customer relationship management, market research, and Web 2.0 mining. We run experiments using four recently introduced feature selection functions, two learning methods of the support vector machines family, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal text classification.

► A novel feature selection method for text classification is described which exploits term frequency information.
► The method can be used to generate variants of all the popular feature selection metrics.
► The method is studied experimentally in the context of “ordinal” text classification.
► Experiments are run using four feature selection functions, two learning methods, and two large datasets.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 11, 1 September 2013, Pages 4687–4696

نویسندگان

Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Using micro-documents for feature selection: The case of ordinal text classification

دسترسی سریع

ارتباط

English Website