کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6940237 1450009 2018 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A probabilistic model derived term weighting scheme for text classification
ترجمه فارسی عنوان
یک مدل احتمالاتی مشتق شده برای تعریف وزن متناسب است
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی
Term weighting is known as a text presentation strategy to assign appropriate value to each term to improve the performance of text classification in the task of transforming the content of textual document into a vector in the term space. Supervised weighting methods using the information on the membership of training documents in predefined classes are naturally expected to provide better results than the unsupervised ones. In this paper, a new weighting scheme is proposed via a matching score function based on a probabilistic model. We introduce a latent variable to indicate whether a term contains text classification information or not, specify conjugate priors and exploit the conjugacy by integrating out the latent indicator and the parameters. Then the non-discriminating terms can be assigned weights close to 0. Experimental results using kNN and SVM classifiers illustrate the effectiveness of the proposed approach on both small and large text data sets.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 110, 15 July 2018, Pages 23-29
نویسندگان
, , , ,