کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
404987 677469 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Novel feature selection method based on harmony search for email classification
ترجمه فارسی عنوان
روش انتخاب رمان مبتنی بر جستجوی هماهنگی برای طبقه بندی ایمیل
کلمات کلیدی
انتخاب ویژگی، فرکانس سند، فرکانس مدت، بهینه سازی پارامتر، جستجوی هماهنگی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Feature selection is often used in email classification to reduce the dimensionality of the feature space. In this study, a new document frequency and term frequency combined feature selection method (DTFS) is proposed to improve the performance of email classification. Firstly, an existing optimal document frequency based feature selection method (ODFFS) and a predetermined threshold are applied to select the most discriminative features. Secondly, an existing optimal term frequency based feature selection (OTFFS) method and another predetermined threshold are applied to select more discriminative features. Finally, ODFFS and OTFFS are combined to select the remaining features. In order to improve the convergence rate of parameter optimization, a metaheuristic method, namely global best harmony oriented harmony search (GBHS), is proposed to search these optimal predetermined thresholds. Experiments with fuzzy Support Vector Machine (FSVM) and Naïve Bayesian (NB) classifiers are applied on six corpuses: PU2, CSDMC2010, PU3, Lingspam, Enron-spam and Trec2007. Experimental results show that, DTFS outperforms other methods: such as Chi-squre, comprehensively measure feature selection, t-test based feature selection, term frequency based information gain, two-step based hybrid feature selection method and improved term frequency inverse document frequency method on six corpuses.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 73, January 2015, Pages 311–323
نویسندگان
, , , ,