کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
865567 909674 2009 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Non-Independent Term Selection for Chinese Text Categorization
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی مهندسی (عمومی)
پیش نمایش صفحه اول مقاله
Non-Independent Term Selection for Chinese Text Categorization
چکیده انگلیسی
Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams), which results in very slow training and working of modern high-performance classifiers. This study assumes that this high-dimensionality problem is related to the redundancy in the term set, which cannot be solved by traditional term selection methods. A greedy algorithm framework named “non-independent term selection” is presented, which reduces the redundancy according to string-level correlations. Several preliminary implementations of this idea are demonstrated. Experiment results show that a good tradeoff can be reached between the performance and the size of the term set.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Tsinghua Science & Technology - Volume 14, Issue 1, February 2009, Pages 113-120
نویسندگان
, ,