کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
494679 862802 2016 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Predicting the effectiveness of pattern-based entity extractor inference
ترجمه فارسی عنوان
پیش بینی اثربخشی استنتاج دهنده استخراج ذرات مبتنی بر الگوی
کلمات کلیدی
معیارهای شباهت رشته، استخراج اطلاعات، برنامه نویسی ژنتیکی، برآورد سختی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Pattern-based entity extraction is an essential component of many digital workflows.
• No accuracy prediction methods exist for extractor generators from examples.
• We propose a predictor based on string similarity and machine learning.
• In-depth experiments on real and challenging data give promising results.

An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor automatically from examples of the desired behavior, which take the form of user-provided annotations of the entities to be extracted from a dataset. We propose a methodology for predicting the accuracy of the extractor that may be inferred from the available examples. We propose several prediction techniques and analyze experimentally our proposals in great depth, with reference to extractors consisting of regular expressions. The results suggest that reliable predictions for tasks of practical complexity may indeed be obtained quickly and without actually generating the entity extractor.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 46, September 2016, Pages 398–406
نویسندگان
, , , ,