کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6940891 870309 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Unsupervised morphological segmentation based on affixality measurements
ترجمه فارسی عنوان
تقسیم بندی مورفولوژیکی بدون نظارت بر اساس اندازه گیری وابستگی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی
In this paper, we present a method for unsupervised morphological segmentation for multi-slot morphology based on affixality measurements. These measurements quantify three linguistic characteristics of affixes: (1) they combine with many low frequency word-bases (high combinatorial capacity), (2) although they are relatively few, they help to maximize the size of a lexicon (economy principle), i.e. speakers know more words by remembering fewer morphological items, and (3) they are very frequent, so they contain less information than word-bases (entropy), i.e. borders between affixes and stems can be detected by finding entropy peaks. Several experiments combining these measurements were conducted to find the best way to apply them to data. The best strategy consists in successive segmentation when the average of the affixality measurements surpasses a threshold of 0.5. Also, we compared this strategy with some state-of-the-art methods for unsupervised morphological segmentation (Morfessor and ParaMor). Our method outperformed these methods, when tested in a hand-made corpus. Results indicate that our proposal is competitive at least for the morphological segmentation of Spanish words.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 84, 1 December 2016, Pages 127-133
نویسندگان
, , ,