دانلود رایگان مقاله: یک سخنرانی از اصطلاحات چند جمله ای برای پردازش زبان طبیعی با دقت و گسترده و گسترده

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
559026	875034	2014	23 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing

ترجمه فارسی عنوان

یک سخنرانی از اصطلاحات چند جمله ای برای پردازش زبان طبیعی با دقت و گسترده و گسترده

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Lexicon Dependency structure Natural Language Processing - پردازش زبان‌های طبیعی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

یک سخنرانی از اصطلاحات چند جمله ای برای پردازش زبان طبیعی با دقت و گسترده و گسترده

چکیده انگلیسی

• We show a lexicon of multiword expressions compiled for Japanese processing.
• 111,000 expressions and their exceeding 820,000 notational variants are contained.
• Syntactic functions, structures and flexibilities of each expression are given in it.
• Its validity is implied by comparing with N-gram frequency big data: LDC2009T08.
• The lexicon is compiled manually, i.e., beyond the “empiricism” in these 30 years.

Since Sag et al. (2002) highlighted a key problem that had been underappreciated in the past in natural language processing (NLP), namely idiosyncratic multiword expressions (MWEs) such as idioms, quasi-idioms, clichés, quasi-clichés, institutionalized phrases, proverbs and old sayings, and how to deal with them, many attempts have been made to extract these expressions from corpora and construct a lexicon of them. However, no extensive, reliable solution has yet been realized. This paper presents an overview of a comprehensive lexicon of Japanese multiword expressions (Japanese MWE Lexicon: JMWEL), which has been compiled in order to realize linguistically precise and wide-coverage natural Japanese processing systems. The JMWEL is characterized by significant notational, syntactic, and semantic diversity as well as a detailed description of the syntactic functions, structures, and flexibilities of MWEs. The lexicon contains about 111,000 header entries written in kana (phonetic characters) and their almost 820,000 variants written in kana and kanji (ideographic characters). The paper demonstrates the JMWEL's validity, supported mainly by comparing the lexicon with a large-scale Japanese N-gram frequency dataset, namely the LDC2009T08 generated by Google Inc. (Kudo and Kazawa, 2009). The present work is an attempt to provide a tentative answer for Japanese, from outside statistical empiricism, to the question posed by Church (2011): “How many multiword expressions do people know?”

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 28, Issue 6, November 2014, Pages 1317–1339

نویسندگان

Toshifumi Tanabe, Masahito Takahashi, Kosho Shudo,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک سخنرانی از اصطلاحات چند جمله ای برای پردازش زبان طبیعی با دقت و گسترده و گسترده

دسترسی سریع

ارتباط

English Website