کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
8960220 1646392 2019 34 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Information extraction from historical handwritten document images with a context-aware neural model
ترجمه فارسی عنوان
استخراج اطلاعات از تصاویر سند دست نویس تاریخی با یک مدل عصبی مصنوعی متنی
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی
Many historical manuscripts that hold trustworthy memories of the past societies contain information organized in a structured layout (e.g. census, birth or marriage records). The precious information stored in these documents cannot be effectively used nor accessed without costly annotation efforts. The transcription driven by the semantic categories of words is crucial for the subsequent access. In this paper we describe an approach to extract information from structured historical handwritten text images and build a knowledge representation for the extraction of meaning out of historical data. The method extracts information, such as named entities, without the need of an intermediate transcription step, thanks to the incorporation of context information through language models. Our system has two variants, the first one is based on bigrams, whereas the second one is based on recurrent neural networks. Concretely, our second architecture integrates a Convolutional Neural Network to model visual information from word images together with a Bidirecitonal Long Short Term Memory network to model the relation among the words. This integrated sequential approach is able to extract more information than just the semantic category (e.g. a semantic category can be associated to a person in a record). Our system is generic, it deals with out-of-vocabulary words by design, and it can be applied to structured handwritten texts from different domains. The method has been validated with the ICDAR IEHHR competition protocol, outperforming the existing approaches.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 86, February 2019, Pages 27-36
نویسندگان
, , , ,