Article ID Journal Published Year Pages File Type
6862760 Knowledge-Based Systems 2013 10 Pages PDF
Abstract
Rapid growth of digital information requires automated handling and organization of documents. The two main stages in automated document categorization are (i) term reduction and (ii) classification. In this paper, we present a novel two-stage term reduction strategy based on Information Gain (IG) theory and Geometric Particle Swarm Optimization (GPSO) search. We evaluate performance of the proposed term reduction approach with use of a new classifier, fuzzy unordered rule induction algorithm (FURIA) to categorize multi-label texts. In order to evaluate the performance of FURIA quantitatively, we compared it against two widely used algorithms, Naïve Bayes and Support Vector Machine (SVM). Text Categorization (TC) performance of the proposed term reduction strategy is validated with use of Reuters-21578 and OHSUMED text collection datasets. The experimental results show that performance of the proposed term reduction method is efficient for document organization tasks.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
,