Heuristic Frequent Term-Based Clustering of News Headlines

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
493268	721685	2012	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Heuristic Frequent Term-Based Clustering of News Headlines

چکیده انگلیسی

Document clustering deals with assigning documents to groups (called clusters) in accordance with the general clustering rule, ‘high intra-cluster document similarity and low inter-cluster document similarity’. In this study, we propose a novel heuristics for clustering news headlines. News headlines are grammatically and semantically different from larger bodies of text, like blog posts and reviews. Based on the heuristics, we implemented versions of the frequent term-based and frequent noun-based clustering algorithms. Both these algorithms, along with k-means, regular frequent term and frequent noun clustering were evaluated using five datasets -Reuters343, Reuters2388 (news headlines), CICLing-2002, Hep-ex and KnCr (scientific abstracts). On interpreting the results based on common external cluster quality evaluation measures (purity, entropy and F measure), it was found that the heuristics performed at par with, or even better than, traditional clustering algorithms and few other intuitive algorithms, when tested using the datasets comprising of news headlines. However, on using the datasets comprising of scientific abstracts, the results were not favorable.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Technology - Volume 6, 2012, Pages 436-443

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Heuristic Frequent Term-Based Clustering of News Headlines

دسترسی سریع

ارتباط

English Website