کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382965 660798 2015 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A classification approach for less popular webpages based on latent semantic analysis and rough set model
ترجمه فارسی عنوان
یک روش طبقه بندی برای صفحات وب کمتر محبوب بر اساس تجزیه و تحلیل معنایی پنهان و مدل مجموعه خشن
کلمات کلیدی
طبقه بندی وب سایت، تجزیه و تحلیل شبکه پیچیده، مجموعه خشن، تجزیه و تحلیل معناشناسی خنثی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• We explore the classification of less popular webpages which have sparse tags.
• LSA extends information of sparse tags of less popular webpages.
• The proposed method refines them into hubs, bridges and attached webpages.
• Density-relation-based rough set model is built to classify attached webpages.
• Attached webpages are semantically classified with the increase of modularity.

Nowadays, with the explosive growth of web information, the webpage classification faces great challenge. Computers have difficulty in understanding the semantic meaning of textual or non-textual webpages. Fortunately, Web 2.0 based collaborative tagging system brings new opportunities to solve this problem. It abstracts structured tags from unstructured content in webpages. However, large numbers of webpages on the Internet are less popular. Their tagging information is sparse, which makes their topic unclear and leads to ambiguous classification. Illuminated by the “ambiguous classification”, we name the less popular webpage “hesitant webpage”. In this paper, we propose an advanced approach for hesitant webpages classification. Firstly, hesitant webpages are divided into bridges, hubs and attached webpages according to their roles on the Internet. Secondly, attached webpages are classified by mining and extending their information in two perspectives. One is the latent semantic analysis (LSA) which is applied to fully explore the semantic meaning of sparse tags. It promotes accurate cognition of webpages semantically close to attached webpages. Another is the proposed density-relation-based rough set model which measures the affiliation degree of attached webpages in different categories. Experiment on real data shows that our approach effectively classifies the hesitant webpages base on the semantic meaning.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 42, Issue 1, January 2015, Pages 642–648
نویسندگان
, , ,