Clustering tagged documents with labeled and unlabeled documents

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515894	867136	2013	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Document clustering - خوشه بندی مستند Semi-supervised clustering - خوشه بندی نیمه نظارت Text mining - متن‌کاوی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Clustering tagged documents with labeled and unlabeled documents

چکیده انگلیسی

This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance evaluations. The first data set is a document set whose boundaries among the clusters are not clear; while the second one has clear boundaries among clusters. This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations. The experimental results indicate that almost all of the methods can benefit from tags. However, unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA functions properly. In many real applications, background knowledge is ready, making it appropriate to employ background knowledge in the clustering process to make the learning more fast and effective.

► Employ four combinations of tags and words to analyze how tags can facilitate document clustering.
► Apply our proposed semi-supervised clustering algorithm to cluster tagged documents.
► Implement several state-of-the-art algorithms and compare with our approach.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 49, Issue 3, May 2013, Pages 596–606

نویسندگان

Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Chun-Hsien Chen,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Clustering tagged documents with labeled and unlabeled documents

دسترسی سریع

ارتباط

English Website