کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515288 866979 2006 23 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Combining preference- and content-based approaches for improving document clustering effectiveness
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Combining preference- and content-based approaches for improving document clustering effectiveness
چکیده انگلیسی

E-commerce and knowledge management applications generate and consume tremendous amounts of online information that is typically available as textual documents. To facilitate subsequent access of and leverage from these textual documents, the efficient and effective management of the ever-increasing volume of documents is essential to both organizations and individuals. Document management practices suggest the popularity of using categories (e.g., folders) for organizing, archiving, and accessing documents. Document clustering represents an appealing approach to enable organizations or individuals to create and maintain document categories automatically. Existing document clustering techniques usually group together similar documents on the basis of their textual content similarity. However, such content-based approaches operate at the lexical level and suffer greatly from the word mismatch problem. Therefore, this study aims to address this problem by exploiting users’ document grouping preferences, as exhibited in those individuals’ folder sets, to support document clustering. Specifically, we propose a hybrid document clustering technique that combines preference- and content-based approaches. Using a traditional content-based and a preference/content switching document clustering technique as performance benchmarks, our empirical evaluation results show that the proposed hybrid technique improves the clustering effectiveness measured by both cluster precision and cluster recall.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 42, Issue 2, March 2006, Pages 350–372
نویسندگان
, , , ,