Virtual relevant documents in text categorization with support vector machines

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
516005	867162	2007	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Prior knowledge - دانش قبلی Text categorization - طبقه بندی متن Support vectors - پشتیبانی بردارها

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Virtual relevant documents in text categorization with support vector machines

چکیده انگلیسی

This paper explores the incorporation of prior knowledge into support vector machines as a means of compensating for a shortage of training data in text categorization. The prior knowledge about transformation invariance is generated by a virtual document method. The method applies a simple transformation to documents, i.e., making virtual documents by combining relevant document pairs for a topic in the training set. The virtual document thus created not only is expected to preserve the topic, but even improve the topical representation by exploiting relevant terms that are not given high importance in individual real documents. Artificially generated documents result in the change in the distribution of training data without the randomization. Experiments with support vector machines based on linear, polynomial and radial-basis function kernels showed the effectiveness on Reuters-21578 set for the topics with a small number of relevant documents. The proposed method achieved 131%, 34%, 12% improvements in micro-averaged F1 for 25, 46, and 58 topics with less than 10, 30, and 50 relevant documents in learning, respectively. The result analysis indicates that incorporating virtual documents contributes to a steady improvement on the performance.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 43, Issue 4, July 2007, Pages 902–913

نویسندگان

Kyung-Soon Lee, Kyo Kageura,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Virtual relevant documents in text categorization with support vector machines

دسترسی سریع

ارتباط

English Website