دانلود رایگان مقاله: تشخیص هرزنامه دائمی با استفاده از ویژگی های مبتنی بر آیتم های واژگانی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
455876	695595	2014	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Stable web spam detection using features based on lexical items

ترجمه فارسی عنوان

تشخیص هرزنامه دائمی با استفاده از ویژگی های مبتنی بر آیتم های واژگانی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

شناسایی اسپم وب، ویژگی های تشخیص هرزنامه عبارات منظم، تجزیه و تحلیل محتوا، تجزیه و تحلیل آیتم های لاتین

Context analysis - تجزیه و تحلیل محتوا Regular expressions - عبارات منظم

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش مقاله

تشخیص هرزنامه دائمی با استفاده از ویژگی های مبتنی بر آیتم های واژگانی

چکیده انگلیسی

Web spam is a method of manipulating search engines results by improving ranks of spam pages. It takes various forms and lacks a consistent definition. Web spam detectors use machine learning techniques to detect spam. However, the detectors are mostly verified on data sets coming from the same year as the learning sets. In this paper we compared Support Vector Machine classifiers trained and tested on WEBSPAM–UK data sets from different years. To obtain stable results we proposed new lexical-based features. The HTML document – transformed into a text without HTML tags, a set of visible symbols, and a list of links including the ones from tags – gave information about weird combinations of letters; consonant clusters; statistics on syllables, words, and sentences; and the Gunning Fog Index. Using data collected in 2006 as a learning set, we obtained very stable accuracy among years. This choice of the training set reduced the sensitivity in 2007, but that can be improved by managing the acceptance threshold. Finally, we proved that the balance between the sensitivity and the specificity measured by the Area Under the Curve (AUC) is improved by our selection of features.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Security - Volume 46, October 2014, Pages 79–93

نویسندگان

Marcin Luckner, Michał Gad, Paweł Sobkowiak,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : تشخیص هرزنامه دائمی با استفاده از ویژگی های مبتنی بر آیتم های واژگانی

دسترسی سریع

ارتباط

English Website