An HMM-based over-sampling technique to improve text classification

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
383570	660827	2013	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Text classification - طبقه بندی متن Hidden Markov model - مدل پنهان مارکوف

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

An HMM-based over-sampling technique to improve text classification

چکیده انگلیسی

• An over-sampling balancing method based on document content is proposed.
• The technique includes an HMM that generates samples based on existing documents.
• The model is tested with a SVM classifier in two medical document collections.
• Results show the method outperforms another well-used data balancing techniques.

This paper presents a novel over-sampling method based on document content to handle the class imbalance problem in text classification. The new technique, COS-HMM (Content-based Over-Sampling HMM), includes an HMM that is trained with a corpus in order to create new samples according to current documents. The HMM is treated as a document generator which can produce synthetical instances formed on what it was trained with.To demonstrate its achievement, COS-HMM is tested with a Support Vector Machine (SVM) in two medical documental corpora (OHSUMED and TREC Genomics), and is then compared with the Random Over-Sampling (ROS) and SMOTE techniques. Results suggest that the application of over-sampling strategies increases the global performance of the SVM to classify documents. Based on the empirical and statistical studies, the new method clearly outperforms the baseline method (ROS), and offers a greater performance than SMOTE in the majority of tested cases.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 18, 15 December 2013, Pages 7184–7192

نویسندگان

E.L. Iglesias, A. Seara Vieira, L. Borrajo,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

An HMM-based over-sampling technique to improve text classification

دسترسی سریع

ارتباط

English Website