کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
461735 696627 2013 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
SAAD, a content based Web Spam Analyzer and Detector
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
SAAD, a content based Web Spam Analyzer and Detector
چکیده انگلیسی


• SAAD detects Web Spam pages based on new heuristics combined with a C4.5 classifier.
• The study compares SAAD and previous studies in two well known and public datasets.
• The paper discusses the effectiveness and efficiency of heuristics used by SAAD.
• SAAD achieves a precision of 0.972 and recall of 0.850.
• SAAD obtains results from 6 to 27 % better than the existing systems.

Web Spam is one of the main difficulties that crawlers have to overcome and therefore one of the main problems of the WWW. There are several studies about characterising and detecting Web Spam pages. However, none of them deals with all the possible kinds of Web Spam. This paper shows an analysis of different kinds of Web Spam pages and identifies new elements that characterise it, to define heuristics which are able to partially detect them. We also discuss and explain several heuristics from the point of view of their effectiveness and computational efficiency. Taking them into account, we study several sets of heuristics and demonstrate how they improve the current results. Finally, we propose a new Web Spam detection system called SAAD (Spam Analyzer And Detector), which is based on the set of proposed heuristics and their use in a C4.5 classifier improved by means of Bagging and Boosting techniques. We have also tested our system in some well known Web Spam datasets and we have found it to be very effective.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Systems and Software - Volume 86, Issue 11, November 2013, Pages 2906–2918
نویسندگان
, , ,