کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4950341 | 1440638 | 2017 | 30 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
An optimized approach for massive web page classification using entity similarity based on semantic network
ترجمه فارسی عنوان
رویکرد بهینه سازی برای طبقه بندی صفحات وب عظیم با استفاده از شباهت نهادی بر اساس شبکه معنایی
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
طبقه بندی وب سایت، شبکه معنایی، رابطه ارتباط زناشویی، احتمال کلاس اقلیم، وزن ارثی،
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
نظریه محاسباتی و ریاضیات
چکیده انگلیسی
With the development of mobile technology, the users browsing habits are gradually shifted from only information retrieval to active recommendation. The classification mapping algorithm between users interests and web contents has been become more and more difficult with the volume and variety of web pages. Some big news portal sites and social media companies hire more editors to label these new concepts and words, and use the computing servers with larger memory to deal with the massive document classification, based on traditional supervised or semi-supervised machine learning methods. This paper provides an optimized classification algorithm for massive web page classification using semantic networks, such as Wikipedia, WordNet. In this paper, we used Wikipedia data set and initialized a few category entity words as class words. A weight estimation algorithm based on the depth and breadth of Wikipedia network is used to calculate the class weight of all Wikipedia Entity Words. A kinship-relation association based on content similarity of entity was therefore suggested optimizing the unbalance problem when a category node inherited the probability from multiple fathers. The keywords in the web page are extracted from the title and the main text using N-gram with Wikipedia Entity Words, and Bayesian classifier is used to estimate the page class probability. Experimental results showed that the proposed method obtained good scalability, robustness and reliability for massive web pages.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 76, November 2017, Pages 510-518
Journal: Future Generation Computer Systems - Volume 76, November 2017, Pages 510-518
نویسندگان
Huakang Li, Zheng Xu, Tao Li, Guozi Sun, Kim-Kwang Raymond Choo,