کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6862765 677015 2013 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Feature selection via maximizing global information gain for text classification
ترجمه فارسی عنوان
انتخاب ویژگی از طریق به حداکثر رساندن افزایش اطلاعات جهانی برای طبقه بندی متن
کلمات کلیدی
انتخاب ویژگی، طبقه بندی متن، ابعاد بزرگ، خوشه بندی توزیعی، تنگنا اطلاعات
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one feature's predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 54, December 2013, Pages 298-309
نویسندگان
, , , , ,