A logistic regression-based smoothing method for Chinese text categorization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
385606	660868	2011	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Feature selection - انتخاب ویژگی Word segmentation - تقسیم بندی کلمه Logistic regression - رگرسیون لوجستیک Text classification - طبقه بندی متن

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

A logistic regression-based smoothing method for Chinese text categorization

چکیده انگلیسی

Automatic Chinese text classification is an important and a well-known technology in the field of machine learning. The first step for solving Chinese text categorization problems is to tokenize the Chinese words from a sequence of non-segmented sentences. However, previous literatures often employ a Chinese word tokenizer that was trained with different sources and then perform the conventional text classification approaches. However, these taggers are not perfect and often provide incorrect word boundary information. In this paper, we propose an N-gram-based language model which takes word relations into account for Chinese text categorization without Chinese word tokenizer. To prevent from out-of-vocabulary, we also propose a novel smoothing approach based on logistic regression to improve accuracy. The experimental result shows that our approach outperforms traditional methods at least 11% on micro-average F-measure.

Research highlights
► An N-gram Language model is selected for Chinese text categorization.
► A novel smoothing method based on logistic regression is proposed.
► The chi-square value is used to exam the importance of N-gram for feature selection.
► Our approach outperforms traditional methods at least 11% on micro-average F-measure.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 38, Issue 9, September 2011, Pages 11581–11590

نویسندگان

Show-Jane Yen, Yue-Shi Lee, Jia-Ching Ying, Yu-Chieh Wu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A logistic regression-based smoothing method for Chinese text categorization

دسترسی سریع

ارتباط

English Website