کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
536363 870505 2014 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Text line extraction for historical document images
ترجمه فارسی عنوان
استخراج خط متن برای تصاویر سند تاریخی
کلمات کلیدی
حکاکی روی، استخراج خط، چند زبانه، تحویل فاصله امضا شده، برنامه نویسی دینامیک، دست خط
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• We present a language independent global method for automatic text line extraction.
• It uses the seam carving method to determine the medial and separating seams.
• In binary images, it extracts lines as components which intersect the medial seam.
• In gray-scale images, it extracts lines as stripes between the separating seams.

In this paper we present a language independent global method for automatic text line extraction. The proposed approach computes an energy map of a text image and determines the seams that pass across and between text lines. In this work we have developed two algorithms along this novel idea, one for binary images and the other for grayscale images. The first algorithm works on binary document images and assumes it is possible to extract the components along text lines. The seam passes on the middle and along the text line, l, and marks the components that make the letters and words of l. It then assigns the unmarked component to the closest text line. The second algorithm works directly on grayscale document images. It computes the distance transform directly from the grayscale images and generates two types of seams: medial seams and separating seams. The medial seams determine the text lines and the separating seams define the upper and lower boundaries of these text lines. Moreover, we present a new benchmark dataset of historical document images with various types of challenges. The dataset contains a groundtruth for text line extraction and it contains samples with different languages such as: Arabic, English and Spanish. A binary dataset is used to test the binary algorithm. We performed various experimental results using our two algorithms on the mentioned datasets and report segmentation accuracy. We also compare our algorithms with the state-of-the-art text line segmentation methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 35, 1 January 2014, Pages 23–33
نویسندگان
, , ,