کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
865567 | 909674 | 2009 | 8 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Non-Independent Term Selection for Chinese Text Categorization
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
سایر رشته های مهندسی
مهندسی (عمومی)
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams), which results in very slow training and working of modern high-performance classifiers. This study assumes that this high-dimensionality problem is related to the redundancy in the term set, which cannot be solved by traditional term selection methods. A greedy algorithm framework named “non-independent term selection” is presented, which reduces the redundancy according to string-level correlations. Several preliminary implementations of this idea are demonstrated. Experiment results show that a good tradeoff can be reached between the performance and the size of the term set.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Tsinghua Science & Technology - Volume 14, Issue 1, February 2009, Pages 113-120
Journal: Tsinghua Science & Technology - Volume 14, Issue 1, February 2009, Pages 113-120
نویسندگان
Li (ææ¯é³), Sun (åèæ¾),