کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
388511 660926 2011 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A subspace decision cluster classifier for text classification
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A subspace decision cluster classifier for text classification
چکیده انگلیسی

In this paper, a new classification method (SDCC) for high dimensional text data with multiple classes is proposed. In this method, a subspace decision cluster classification (SDCC) model consists of a set of disjoint subspace decision clusters, each labeled with a dominant class to determine the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a subspace clustering algorithm Entropy Weighting k-Means algorithm. Then, the SDCC model is extracted from the subspace decision cluster tree. Various tests including Anderson–Darling test are used to determine the stopping condition of the tree growing. A series of experiments on real text data sets have been conducted. Their results show that the new classification method (SDCC) outperforms the existing methods like decision tree and SVM. SDCC is particularly suitable for large, high dimensional sparse text data with many classes.

In text classification, one challenging problem is the sparse and high dimensional feature space that can lead to meaningless distance metrics for discriminating samples. This paper presents a novel subspace decision cluster classification (SDCC) method to solve classification problems by clustering processes, where subspace clustering technique can be exploited to effectively identify class distributions in meaningful subspaces. SDCC generates a cluster tree from a training data set by recursively calling a subspace clustering algorithm Entropy Weighting k-means algorithm. Then, the classification model is extracted from the subspace decision cluster tree. A series of experiments on real text data sets have been conducted. Their results show that SDCC outperforms SVM and some other methods, especially for high dimensional text data with many classes.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 38, Issue 10, 15 September 2011, Pages 12475–12482
نویسندگان
, , ,