Article ID Journal Published Year Pages File Type
862643 Procedia Engineering 2011 6 Pages PDF
Abstract

To classify large-scale text corpora, one common approach is using hierarchical text classification and classifying text documents in a top-down manner. Classification methods using top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from a common problem: a high level of misclassification document has unrecoverable. We define an virtual subclass for each non-leaf category to help the rejected documents go back to ancestor category,thus improving the overall performance .Our experiments using Support Vector Machine (SVM) classifiers on the 20newsgroup collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the virtual category method delivered the best performance.

Related Topics
Physical Sciences and Engineering Engineering Engineering (General)