Language morphology offset: Text classification on a Croatian

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515960	867156	2008	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Language morphology offset: Text classification on a Croatian–English parallel corpus

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Croatian Feature selection - انتخاب ویژگی English - انگلیسی Stemming - سقوط Text classification - طبقه بندی متن SVM - ماشین بردار پشتیبانی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Language morphology offset: Text classification on a Croatian–English parallel corpus

چکیده انگلیسی

We investigate how, and to what extent, morphological complexity of the language influences text classification using support vector machines (SVM). The Croatian–English parallel corpus provides the basis for direct comparison of two languages of radically different morphological complexity. We quantified, compared, and statistically tested the effects of morphological normalisation on SVM classifier performance based on a series of parallel experiments on both languages, carried over a large scale of different feature subset sizes obtained by different feature selection methods, and applying different levels of morphological normalisation. We also quantified the trade-off between feature space size and performance for different levels of morphological normalisation, and compared the results for both languages. Our experiments have shown that the improvements in SVM classifier performance is statistically significant; they are greater for small and medium number of features, especially for Croatian, whereas for large number of features the improvements are rather small and may be negligible in practice for both languages.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 44, Issue 1, January 2008, Pages 325–339

نویسندگان

M. Malenica, T. Šmuc, J. Šnajder, B. Dalbelo Bašić,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Language morphology offset: Text classification on a Croatian–English parallel corpus

دسترسی سریع

ارتباط

English Website