کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
387185 660897 2009 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Detection of cloaked web spam by using tag-based methods
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Detection of cloaked web spam by using tag-based methods
چکیده انگلیسی

Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results. Cloaking is a widely adopted technique of concealing web spam by replying different content to search engines’ crawlers from that displayed in a web browser. Previous work on cloaking detection is mainly based on the differences in terms and/or links between multiple copies of a URL retrieved from web browser and search engine crawler perspectives. This work presents three methods of using difference in tags to determine whether a URL is cloaked. Since the tags of a web page generally do not change as frequently and significantly as the terms and links of the web page, tag-based cloaking detection methods can work more effectively than the term- or link-based methods. The proposed methods are tested with a dataset of URLs covering short-, medium- and long-term users’ interest. Experimental results indicate that the tag-based methods outperform term- or link-based methods in both precision and recall. Moreover, a Weka J4.8 classifier using a combination of term and tag features yields an accuracy rate of 90.48%.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 36, Issue 4, May 2009, Pages 7493–7499
نویسندگان
,