کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
535950 | 870412 | 2011 | 8 صفحه PDF | دانلود رایگان |
Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks.
► Show the difficulty sources of Arabic TC.
► Show the need for feature selection.
► Comparison of seventeen traditional FSS metrics for Arabic TC tasks.
► The usage of IR performance metrics as FSS for Arabic TC tasks.
► Comparison of SVM, NB, kNN and Rochio classifiers for Arabic TC tasks.
Journal: Pattern Recognition Letters - Volume 32, Issue 14, 15 October 2011, Pages 1922–1929