کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
405389 | 677551 | 2008 | 5 صفحه PDF | دانلود رایگان |
![عکس صفحه اول مقاله: Multinomial mixture model with feature selection for text clustering Multinomial mixture model with feature selection for text clustering](/preview/png/405389.png)
The task of selecting relevant features is a hard problem in the field of unsupervised text clustering due to the absence of class labels that would guide the search. This paper proposes a new mixture model method for unsupervised text clustering, named multinomial mixture model with feature selection (M3FS). In M3FS, we introduce the concept of component-dependent “feature saliency” to the mixture model. We say a feature is relevant to a certain mixture component if the feature saliency value is higher than a predefined threshold. Thus the feature selection process is treated as a parameter estimation problem. The Expectation–Maximization (EM) algorithm is then used for estimating the model. The experiment results on commonly used text datasets show that the M3FS method has good clustering performance and feature selection capability.
Journal: Knowledge-Based Systems - Volume 21, Issue 7, October 2008, Pages 704–708