Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
405321 | Knowledge-Based Systems | 2011 | 8 Pages |
Due to the rapid growth of free text documents available in digital form, efficient techniques of automatic categorization are of great importance. In this paper, we present an efficient rule-based method for categorizing free text documents. The contributions of this research are the formation of lexical syntactic patterns as basic classification features, a categorization framework that addresses the problem of classifying free text with minimal label description, and an efficient learning algorithm in terms of time complexity and F-measure. The framework of ROLEX-SP concentrates on capturing the correct classes of text as well as reducing classification errors.We performed experiments in order to evaluate the proposed method and compare our work with state-of-the-art methods in domain specific source of knowledge. The results indicate that ROLEX-SP outperforms other methods in terms of standard F-measure in medical domain because of the strong definition of MeSH description of medical categories.