Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6862760 | Knowledge-Based Systems | 2013 | 10 Pages |
Abstract
Rapid growth of digital information requires automated handling and organization of documents. The two main stages in automated document categorization are (i) term reduction and (ii) classification. In this paper, we present a novel two-stage term reduction strategy based on Information Gain (IG) theory and Geometric Particle Swarm Optimization (GPSO) search. We evaluate performance of the proposed term reduction approach with use of a new classifier, fuzzy unordered rule induction algorithm (FURIA) to categorize multi-label texts. In order to evaluate the performance of FURIA quantitatively, we compared it against two widely used algorithms, Naïve Bayes and Support Vector Machine (SVM). Text Categorization (TC) performance of the proposed term reduction strategy is validated with use of Reuters-21578 and OHSUMED text collection datasets. The experimental results show that performance of the proposed term reduction method is efficient for document organization tasks.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Mustafa Karabulut,