کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
725044 1461245 2012 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Efficient representation of text with multiple perspectives
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی مهندسی برق و الکترونیک
پیش نمایش صفحه اول مقاله
Efficient representation of text with multiple perspectives
چکیده انگلیسی

An effective text representation scheme dominates the performance of text categorization system. However, based on the assumption of independent terms, the traditional schemes which tediously use term frequency (TF) and document frequency (DF) are insufficient for capturing enough information of a document and result in poor performance. To overcome this limitation, we investigate exploring the relationships between different terms of the same class tendency and the way of measuring the importance of a repetitive term in a document. In this paper, a group of novel term weighting factors are proposed to enhance the category contribution for each term. Then, based on a novel strategy of generating passages from document, we present two schemes, the weighted co-contributions of different terms corresponding to the class tendency and the weighted co-contributions for each term in different passages, to achieve improvements on text representation. The prior scheme works in a dimensionality reduction mode while the second one runs in the conventional way. By employing the support vector machine (SVM) classifier, experiments on four benchmark corpora show that the proposed schemes could achieve a consistent better performance than the conventional methods in both efficiency and accuracy. Further analysis also confirms some promising directions for the future works.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: The Journal of China Universities of Posts and Telecommunications - Volume 19, Issue 1, February 2012, Pages 101-111