کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
377038 658356 2012 35 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm
چکیده انگلیسی

While there has been a long history of rule-based text classifiers, to the best of our knowledge no M-of-N-based approach for text categorization has so far been proposed. In this paper we argue that M-of-N hypotheses are particularly suitable to model the text classification task because of the so-called “family resemblance” metaphor: “the members (i.e., documents) of a family (i.e., category) share some small number of features, yet there is no common feature among all of them. Nevertheless, they resemble each other”. Starting from this conjecture, we provide a sound extension of the M-of-N approach with negation and disjunction, called M-of-N{¬,∨}, which enables to best fit the true structure of the data. Based on a thorough theoretical study, we show that the M-of-N{¬,∨} hypothesis space has two partial orders that form complete lattices.GAMoN is the task-specific Genetic Algorithm (GA) which, by exploiting the lattice-based structure of the hypothesis space, efficiently induces accurate M-of-N{¬,∨} hypotheses.Benchmarking was performed over 13 real-world text data sets, by using four rule induction algorithms: two GAs, namely, BioHEL and OlexGA, and two non-evolutionary algorithms, namely, C4.5 and Ripper. Further, we included in our study linear SVM, as it is reported to be among the best methods for text categorization. Experimental results demonstrate that GAMoN delivers state-of-the-art classification performance, providing a good balance between accuracy and model complexity. Further, they show that GAMoN can scale up to large and realistic real-world domains better than both C4.5 and Ripper.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Artificial Intelligence - Volumes 191–192, November 2012, Pages 61-95