کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
536674 870603 2008 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Time-efficient spam e-mail filtering using n-gram models
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Time-efficient spam e-mail filtering using n-gram models
چکیده انگلیسی

In this paper, we propose spam e-mail filtering methods having high accuracies and low time complexities. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics. We develop two models, a class general model and an e-mail specific model, and test the methods under these models. The models are then combined in such a way that the latter one is activated for the cases the first model falls short. Though the approach proposed and the methods developed are general and can be applied to any language, we mainly apply them to Turkish, which is an agglutinative language, and examine some properties of the language. Extensive tests were performed and success rates about 98% for Turkish and 99% for English were obtained. It has been shown that the time complexities can be reduced significantly without sacrificing performance.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 29, Issue 1, 1 January 2008, Pages 19–33
نویسندگان
, ,