کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
516005 867162 2007 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Virtual relevant documents in text categorization with support vector machines
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Virtual relevant documents in text categorization with support vector machines
چکیده انگلیسی

This paper explores the incorporation of prior knowledge into support vector machines as a means of compensating for a shortage of training data in text categorization. The prior knowledge about transformation invariance is generated by a virtual document method. The method applies a simple transformation to documents, i.e., making virtual documents by combining relevant document pairs for a topic in the training set. The virtual document thus created not only is expected to preserve the topic, but even improve the topical representation by exploiting relevant terms that are not given high importance in individual real documents. Artificially generated documents result in the change in the distribution of training data without the randomization. Experiments with support vector machines based on linear, polynomial and radial-basis function kernels showed the effectiveness on Reuters-21578 set for the topics with a small number of relevant documents. The proposed method achieved 131%, 34%, 12% improvements in micro-averaged F1 for 25, 46, and 58 topics with less than 10, 30, and 50 relevant documents in learning, respectively. The result analysis indicates that incorporating virtual documents contributes to a steady improvement on the performance.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 43, Issue 4, July 2007, Pages 902–913
نویسندگان
, ,