کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
534873 | 870297 | 2011 | 8 صفحه PDF | دانلود رایگان |

This paper examines the Rocchio algorithm and its application in text categorization. Existing approaches using global parameters optimization of Rocchio algorithm result in choosing one fixed prototype representing each category for multi-category text categorization problems. Therefore, they have limited discriminating power on different category’s distribution and their parameter optimization methods are based on weak representation ability of the negative samples consisting of several categories. We present a pairwise optimized Rocchio algorithm, which dynamically adjusts the prototype position between pairs of categories. Experiments were conducted on three benchmark corpora, the 20-Newsgroup, Reuters-21578 and TDT2. The results confirm that our proposed pairwise method achieves encouraging performance improvement over the conventional Rocchio method. A comparative study with the top notch text classifier Support Vector Machine (SVM) also shows the pairwise Rocchio method achieves competitive results.
Research Highlights
► Conventional Rocchio algorithm has weak representing ability by choosing one fixed prototype for each category.
► The pairwise optimized method dynamically adjusts the prototype position between pairs of categories.
► Text categorization experiments were conducted on three benchmark corpora, the 20-Newsgroup, Reuters-21578, and TDT2.
► The results confirm that the proposed pairwise method achieves encouraging performance improvement over the conventional Rocchio method, and demonstrates competitive with SVM.
Journal: Pattern Recognition Letters - Volume 32, Issue 2, 15 January 2011, Pages 375–382