دانلود رایگان مقاله: ارزیابی تجربی لینک و محتوا مبتنی بر گنجینه خزنده

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
454669	695267	2016	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Empirical evaluation of the link and content-based focused Treasure-Crawler

ترجمه فارسی عنوان

ارزیابی تجربی لینک و محتوا مبتنی بر گنجینه خزنده

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

کرولر وب متمرکز؛ نمودار تی ؛ داده های HTML؛ بازیابی اطلاعات؛ موتور جستجو

Focused Web crawler Information retrieval - بازیابی اطلاعات Search engine - موتور جستجو

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش مقاله

ارزیابی تجربی لینک و محتوا مبتنی بر گنجینه خزنده

چکیده انگلیسی

• We present the experimental results of a focused Web crawler that combines link-based and content-based approaches to predict the topical focus of an unvisited page.
• We present a custom method using Dewey decimal classification system to best classify the subject of an unvisited page into standard human knowledge categories.
• To prioritize an unvisited URL, we use a dynamic, flexible and updating hierarchical data structure called T-Graph, which helps find the shortest path to get to on-topic pages on the Web.
• For the background review, the experimental results from several crawlers are presented.
• We compare our results against other significant focused Web crawlers.

Indexing the Web is becoming a laborious task for search engines as the Web exponentially grows in size and distribution. Presently, the most effective known approach to overcome this problem is the use of focused crawlers. A focused crawler employs a significant and unique algorithm in order to detect the pages on the Web that relate to its topic of interest. For this purpose we proposed a custom method that uses specific HTML elements of a page to predict the topical focus of all the pages that have an unvisited link within the current page. These recognized on-topic pages have to be sorted later based on their relevance to the main topic of the crawler for further actual downloads. In the Treasure-Crawler, we use a hierarchical structure called T-Graph which is an exemplary guide to assign appropriate priority score to each unvisited link. These URLs will later be downloaded based on this priority. This paper embodies the implementation, test results and performance evaluation of the Treasure-Crawler system. The Treasure-Crawler is evaluated in terms of specific information retrieval criteria such as recall and precision, both with values close to 50%. Gaining such outcome asserts the significance of the proposed approach.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Standards & Interfaces - Volume 44, February 2016, Pages 54–62

نویسندگان

Ali Seyfi, Ahmed Patel, Joaquim Celestino Júnior,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : ارزیابی تجربی لینک و محتوا مبتنی بر گنجینه خزنده

دسترسی سریع

ارتباط

English Website