کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515205 866968 2007 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A hybrid generative/discriminative approach to text classification with additional information
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
A hybrid generative/discriminative approach to text classification with additional information
چکیده انگلیسی

This paper presents a classifier for text data samples consisting of main text and additional components, such as Web pages and technical papers. We focus on multiclass and single-labeled text classification problems and design the classifier based on a hybrid composed of probabilistic generative and discriminative approaches. Our formulation considers individual component generative models and constructs the classifier by combining these trained models based on the maximum entropy principle. We use naive Bayes models as the component generative models for the main text and additional components such as titles, links, and authors, so that we can apply our formulation to document and Web page classification problems. Our experimental results for four test collections confirmed that our hybrid approach effectively combined main text and additional components and thus improved classification performance.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 43, Issue 2, March 2007, Pages 379–392
نویسندگان
, , ,