کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4943360 1437625 2017 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Variable Global Feature Selection Scheme for automatic classification of text documents
ترجمه فارسی عنوان
طرح انتخاب متغیر جهانی برای طبقه بندی خودکار اسناد متنی
کلمات کلیدی
انتخاب ویژگی، طبقه بندی سند متن، استخراج متن، تجزیه و تحلیل متن،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
The feature selection is important to speed up the process of Automatic Text Document Classification (ATDC). At present, the most common method for discriminating feature selection is based on Global Filter-based Feature Selection Scheme (GFSS). The GFSS assigns a score to each feature based on its discriminating power and selects the top-N features from the feature set, where N is an empirically determined number. As a result, it may be possible that the features of a few classes are discarded either partially or completely. The Improved Global Feature Selection Scheme (IGFSS) solves this issue by selecting an equal number of representative features from all the classes. However, it suffers in dealing with an unbalanced dataset having large number of classes. The distribution of features in these classes are highly variable. In this case, if an equal number of features are chosen from each class, it may exclude some important features from the class containing a higher number of features. To overcome this problem, we propose a novel Variable Global Feature Selection Scheme (VGFSS) to select a variable number of features from each class based on the distribution of terms in the classes. It ensures that, a minimum number of terms are selected from each class. The numerical results on benchmark datasets show the effectiveness of the proposed algorithm VGFSS over classical information science methods and IGFSS.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 81, 15 September 2017, Pages 268-281
نویسندگان
, , ,