کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
380570 1437444 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A novel framework for termset selection and weighting in binary text classification
ترجمه فارسی عنوان
یک چارچوب جدید برای انتخاب شرایط و وزن در طبقه بندی متن دوتایی
کلمات کلیدی
ویژگی های همزیستی، انتخاب ترمینال، وزن ترمینال، نمایندگی سند، طبقه بندی متن
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

This study presents a new framework for termset selection and weighting. The proposed framework is based on employing the joint occurrence statistics of pairs of terms for termset selection and weighting. More specifically, each termset is evaluated by taking into account the simultaneous or individual occurrences of the terms within the termset. Based on the idea that the occurrence of one term but not the other may also convey valuable information for discrimination, the conventionally used term selection schemes are adapted to be employed for termset selection. Similarly, the weight of a selected termset is computed as a function of the terms that occur in the document under concern where a termset is assigned a nonzero weight if either or both of the terms appear in the document. This weight estimation scheme allows evaluation of the individual occurrences of the terms and their co-occurrences separately so as to compute the document-specific weight of each termset. The proposed termset-based representation is concatenated with the bag-of-words approach to construct the document vectors. Experiments conducted on three widely used datasets have verified the effectiveness of the proposed framework.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Engineering Applications of Artificial Intelligence - Volume 35, October 2014, Pages 38–53
نویسندگان
, ,