کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
412791 | 679683 | 2010 | 10 صفحه PDF | دانلود رایگان |
This paper presents a new generalized Dirichlet (GD) mixture model to address the challenging problem of clustering multidimensional data sets on different feature subsets. We approximate class-conditional distributions of mixture components to define binary relevance of features at the level of clusters. We consider a relevant feature as the one providing the knowledge to assign data points in the cluster. Then, we define a new message length objective to learn the model and select both feature subsets and the number of components. The proposed method is general comparatively with existing feature selection and subspace clustering models. In addition, it selects for each cluster only relevant and statistically independent features in a linear time of the number of observations and dimensions. Experiments on synthetic data and in unsupervised image categorization show the merits of our approach.
Journal: Neurocomputing - Volume 73, Issues 10–12, June 2010, Pages 1730–1739