کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
402312 676897 2015 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Term-weighting learning via genetic programming for text classification
ترجمه فارسی عنوان
یادگیری ترتیب وزن از طریق برنامه نویسی ژنتیکی برای طبقه بندی متن
کلمات کلیدی
یادگیری طولانی وزن، برنامه نویسی ژنتیک، استخراج متن، یادگیری نمایندگی، کی از کلمات
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• A new method for learning term-weighting schemes is proposed.
• A genetic program searches for the scheme that maximizes classification performance.
• The method is evaluated in text and image categorization, and authorship attribution.

This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 83, July 2015, Pages 176–189
نویسندگان
, , , , , , ,