Exploring phrasal context and error correction heuristics in bootstrapping for geographic named entity annotation

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
396627	670430	2007	18 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Bootstrapping - راه ‌اندازی خودکار، بوت‌ استرپینگ

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Exploring phrasal context and error correction heuristics in bootstrapping for geographic named entity annotation

چکیده انگلیسی

Geographic named entities can be classified into many sub-types that are useful for applications such as information extraction and question answering. In this paper, we present a high-performance bootstrapping algorithm with error correction heuristics and location normalization for the task of geographic named entity annotation with seven sub-types. Location normalization additionally resolves ambiguities of entities with same name and sub-types. In the initial stage, we annotate a raw corpus using a large set of seeds which is automatically selected from a gazetteer so that its quality does not depend on a specific training corpus. From the initial annotation, boundary patterns reflecting phrasal context are learned and applied to the corpus again to obtain new annotation which passes through error correction heuristics. As the bootstrapping loop proceeds, the annotated instances are gradually increased and the learned boundary patterns become gradually richer and more accurate. Through experiments, we explore inter/intra-phrasal context which reflects syntactic constraints of a named entity and several heuristic knowledge for correcting annotation errors introduced by incomplete boundary patterns. The experiments show the effect of the strategies on the learning curve. When our bootstrapping approach was applied to a newspaper corpus, it could achieve 89 F1 value. And the method suggested for location normalization could achieve 95% accuracy at instance level.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 32, Issue 4, June 2007, Pages 575–592

نویسندگان

Seungwoo Lee, Gary Geunbae Lee,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Exploring phrasal context and error correction heuristics in bootstrapping for geographic named entity annotation

دسترسی سریع

ارتباط

English Website