کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
531631 869863 2007 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
OCR binarization and image pre-processing for searching historical documents
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
OCR binarization and image pre-processing for searching historical documents
چکیده انگلیسی

We consider the problem of document binarization as a pre-processing step for optical character recognition (OCR) for the purpose of keyword search of historical printed documents. A number of promising techniques from the literature for binarization, pre-filtering, and post-binarization denoising were implemented along with newly developed methods for binarization: an error diffusion binarization, a multiresolutional version of Otsu's binarization, and denoising by despeckling. The OCR in the ABBYY FineReader 7.1 SDK is used as a black box metric to compare methods. Results for 12 pages from six newspapers of differing quality show that performance varies widely by image, but that the classic Otsu method and Otsu-based methods perform best on average.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 40, Issue 2, February 2007, Pages 389–397
نویسندگان
, , ,