A Bayesian feature selection paradigm for text classification

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515656	867059	2012	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Text classification - طبقه بندی متن Mixture model - مدل مخلوط

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

A Bayesian feature selection paradigm for text classification

چکیده انگلیسی

The automated classification of texts into predefined categories has witnessed a booming interest, due to the increased availability of documents in digital form and the ensuing need to organize them. An important problem for text classification is feature selection, whose goals are to improve classification effectiveness, computational efficiency, or both. Due to categorization unbalancedness and feature sparsity in social text collection, filter methods may work poorly. In this paper, we perform feature selection in the training process, automatically selecting the best feature subset by learning, from a set of preclassified documents, the characteristics of the categories. We propose a generative probabilistic model, describing categories by distributions, handling the feature selection problem by introducing a binary exclusion/inclusion latent vector, which is updated via an efficient Metropolis search. Real-life examples illustrate the effectiveness of the approach.

► Feature subsets can be scored by a posterior distribution in text classification.
► We handle feature selection by introducing a latent vector in a generative model.
► A Metropolis search is suggested to find the best feature subset automatically.
► Real-life examples illustrate the dichotomization of words in text classification.
► Our method improves classification effectively and sharply reduces the feature set.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 48, Issue 2, March 2012, Pages 283–302

نویسندگان

Guozhong Feng, Jianhua Guo, Bing-Yi Jing, Lizhu Hao,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A Bayesian feature selection paradigm for text classification

دسترسی سریع

ارتباط

English Website