کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4958289 | 1445244 | 2016 | 23 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Improving the text classification using clustering and a novel HMM to reduce the dimensionality
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
In text classification problems, the representation of a document has a strong impact on the performance of learning systems. The high dimensionality of the classical structured representations can lead to burdensome computations due to the great size of real-world data. Consequently, there is a need for reducing the quantity of handled information to improve the classification process. In this paper, we propose a method to reduce the dimensionality of a classical text representation based on a clustering technique to group documents, and a previously developed Hidden Markov Model to represent them. We have applied tests with the k-NN and SVM classifiers on the OHSUMED and TREC benchmark text corpora using the proposed dimensionality reduction technique. The experimental results obtained are very satisfactory compared to commonly used techniques like InfoGain and the statistical tests performed demonstrate the suitability of the proposed technique for the preprocessing step in a text classification task.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Methods and Programs in Biomedicine - Volume 136, November 2016, Pages 119-130
Journal: Computer Methods and Programs in Biomedicine - Volume 136, November 2016, Pages 119-130
نویسندگان
A. Seara Vieira, L. Borrajo, E.L. Iglesias,