کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
405389 677551 2008 5 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Multinomial mixture model with feature selection for text clustering
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Multinomial mixture model with feature selection for text clustering
چکیده انگلیسی

The task of selecting relevant features is a hard problem in the field of unsupervised text clustering due to the absence of class labels that would guide the search. This paper proposes a new mixture model method for unsupervised text clustering, named multinomial mixture model with feature selection (M3FS). In M3FS, we introduce the concept of component-dependent “feature saliency” to the mixture model. We say a feature is relevant to a certain mixture component if the feature saliency value is higher than a predefined threshold. Thus the feature selection process is treated as a parameter estimation problem. The Expectation–Maximization (EM) algorithm is then used for estimating the model. The experiment results on commonly used text datasets show that the M3FS method has good clustering performance and feature selection capability.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 21, Issue 7, October 2008, Pages 704–708
نویسندگان
, ,