کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
536003 870424 2011 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Class-dependent projection based method for text categorization
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Class-dependent projection based method for text categorization
چکیده انگلیسی

Text categorization presents unique challenges to traditional classification methods due to the large number of features inherent in the datasets from real-world applications of text categorization, and a great deal of training samples. In high-dimensional document data, the classes are typically categorized only by subsets of features, which are typically different for the classes of different topics. This paper presents a simple but effective classifier for text categorization using class-dependent projection based method. By projecting onto a set of individual subspaces, the samples belonging to different document classes are separated such that they are easily to be classified. This is achieved by developing a new supervised feature weighting algorithm to learn the optimized subspaces for all the document classes. The experiments carried out on common benchmarking corpuses showed that the proposed method achieved both higher classification accuracy and lower computational costs than some distinguishing classifiers in text categorization, especially for datasets including document categories with overlapping topics.

Research highlights
► We introduce a class-dependent projection method to text categorization. In the new method, the document categories are projected into their special reduced subspaces to make different classes easily separable. The subspaces corresponding to different classes are generated using a soft feature weighting scheme, and are different from each other.
► We extend the traditional centroid-based classifier (CBC) to present a simple but effective classifier for text categorization. The new classifier inherits the merits of simplicity and efficiency of CBC, but significantly improves the classification accuracy, by making use of our class-dependent projection method both in the training and testing phases of the new classifier.
► We gain insights from the experiments that the new classifier is robust with respect to the number of terms used to represent the document, and is able to outperform the SVM based classifiers when the document categories are overlapped considerably.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 32, Issue 10, 15 July 2011, Pages 1493–1501
نویسندگان
, , ,