Using GMDH-based networks for improved spam detection and email feature analysis

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
496920	862873	2011	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Feature selection - انتخاب ویژگی Spam detection - تشخیص هرزنامه Group method of data handling - روش گروهی مدیریت داده ها Neural networks - شبکه های عصبی Bayesian classifiers - طبقه بندی بیزی Spam filtering - فیلتر کردن هرزنامه Soft computing - محاسبات نرم

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Using GMDH-based networks for improved spam detection and email feature analysis

چکیده انگلیسی

Unsolicited or spam email has recently become a major threat that can negatively impact the usability of electronic mail. Spam substantially wastes time and money for business users and network administrators, consumes network bandwidth and storage space, and slows down email servers. In addition, it provides a medium for distributing harmful code and/or offensive content. In this paper, we explore the application of the GMDH (Group Method of Data Handling) based inductive learning approach in detecting spam messages by automatically identifying content features that effectively distinguish spam from legitimate emails. We study the performance for various network model complexities using spambase, a publicly available benchmark dataset. Results reveal that classification accuracies of 91.7% can be achieved using only 10 out of the available 57 attributes, selected through abductive learning as the most effective feature subset (i.e. 82.5% data reduction). We also show how to improve classification performance using abductive network ensembles (committees) trained on different subsets of the training data. Comparison with other techniques such as neural networks and naïve Bayesian classifiers shows that the GMDH-based learning approach can provide better spam detection accuracy with false-positive rates as low as 4.3% and yet requires shorter training time.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 11, Issue 1, January 2011, Pages 477-488

نویسندگان

El-Sayed M. El-Alfy, Radwan E. Abdel-Aal,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Using GMDH-based networks for improved spam detection and email feature analysis

دسترسی سریع

ارتباط

English Website