دانلود رایگان مقاله: تست انتخاب ویژگی تست بر اساس فرکانس اصطلاح برای طبقه بندی متن

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
534282	870244	2014	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

t-Test feature selection approach based on term frequency for text categorization

ترجمه فارسی عنوان

تست انتخاب ویژگی تست بر اساس فرکانس اصطلاح برای طبقه بندی متن

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

انتخاب ویژگی، فرکانس مدت، تست دانشجویی، طبقه بندی متن

Feature selection - انتخاب ویژگی Text classification - طبقه بندی متن Term Frequency - فرکانس مدت

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش مقاله

تست انتخاب ویژگی تست بر اساس فرکانس اصطلاح برای طبقه بندی متن

چکیده انگلیسی

• We prove the frequency distribution of a term is approximately normally distributed.
• We model the diversity of the frequency of a term with t-test.
• We verify our approach on two text corpora with three classifiers.
• Our approach is comparable to or even better than the state-of-the-art methods.

Feature selection techniques play an important role in text categorization (TC), especially for the large-scale TC tasks. Many new and improved methods have been proposed, and most of them are based on document frequency, such as the famous Chi-square statistic and information gain etc. These methods based on document frequency, however, have two shortcomings: (1) they are not reliable for low-frequency terms, that is, low-frequency terms will be filtered because of their smaller weights; and (2) they only count whether one term occurs within a document and ignore term frequency. Actually, high-frequency term (except stop words) occurred in few documents is often regards as a discriminators in the real-life corpus.Aimed at solving the above drawbacks, the paper focuses on how to construct a feature selection function based on term frequency, and proposes a new approach using student t-test. The t -test function is used to measure the diversity of the distributions of a term frequency between the specific category and the entire corpus. Extensive comparative experiments on two text corpora using three classifiers show that the proposed approach is comparable to the state-of-the-art feature selection methods in terms of macro-F1F1 and micro-F1F1. Especially on micro-F1F1, our method achieves slightly better performance on Reuters with k NN and SVMs classifiers, compared to χ2χ2, and IG.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 45, 1 August 2014, Pages 1–10

نویسندگان

Deqing Wang, Hui Zhang, Rui Liu, Weifeng Lv, Datao Wang,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : تست انتخاب ویژگی تست بر اساس فرکانس اصطلاح برای طبقه بندی متن

دسترسی سریع

ارتباط

English Website