Information gain and divergence-based feature selection for machine learning-based text categorization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515314	866982	2006	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Feature selection - انتخاب ویژگی Text categorization - طبقه بندی متن

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Information gain and divergence-based feature selection for machine learning-based text categorization

چکیده انگلیسی

Most previous works of feature selection emphasized only the reduction of high dimensionality of the feature space. But in cases where many features are highly redundant with each other, we must utilize other means, for example, more complex dependence models such as Bayesian network classifiers. In this paper, we introduce a new information gain and divergence-based feature selection method for statistical machine learning-based text categorization without relying on more complex dependence models. Our feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results are given on a number of dataset, showing that our feature selection method is more effective than Koller and Sahami’s method [Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of ICML-96, 13th international conference on machine learning], which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, our feature selection method sometimes produces more improvements of conventional machine learning algorithms over support vector machines which are known to give the best classification accuracy.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 42, Issue 1, January 2006, Pages 155–165

نویسندگان

Changki Lee, Gary Geunbae Lee,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Information gain and divergence-based feature selection for machine learning-based text categorization

دسترسی سریع

ارتباط

English Website