دانلود رایگان مقاله: شناخت ماهیت چندزبانه قوی و به نام با ویژگی های نیمه نظارت کم عمق

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
376771	658307	2016	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Robust multilingual Named Entity Recognition with shallow semi-supervised features

ترجمه فارسی عنوان

شناخت ماهیت چندزبانه قوی و به نام با ویژگی های نیمه نظارت کم عمق

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

شناخت ماهیت و به نام؛ استخراج اطلاعات؛ خوشه؛ یادگیری نیمه‌نظارتی؛ پردازش زبان طبیعی

IE, Information extraction - استخراج اطلاعات Clustering - خوشه بندی Named entity recognition - شناسایی نهاد Natural Language Processing - پردازش زبان‌های طبیعی Semi-supervised learning - یاگیری نیمه‌نظارتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

شناخت ماهیت چندزبانه قوی و به نام با ویژگی های نیمه نظارت کم عمق

چکیده انگلیسی

We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Artificial Intelligence - Volume 238, September 2016, Pages 63–82

نویسندگان

Rodrigo Agerri, German Rigau,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : شناخت ماهیت چندزبانه قوی و به نام با ویژگی های نیمه نظارت کم عمق

دسترسی سریع

ارتباط

English Website