کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
536551 870558 2010 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Semi-supervised learning for text-line detection
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Semi-supervised learning for text-line detection
چکیده انگلیسی

Automatically detecting text-lines from document images has been long studied. However, most researchers today are focusing on boosting the detection rate instead of noise removal. In this paper, we propose a semi-supervised learning framework that targets to segment Manhattan-layout documents with significant levels of noise. The algorithm consists of three steps: first, an initial segmentation process uses the seed filling algorithm; second, an iterative grouping process uses the projection profiles to estimate the vertical border of page contents; third, an inside page-content noise removal uses the online training and classification.We test our algorithm using two databases. The first is the University of Washington (UW)-III database with 1,600 images of different input qualities that has been widely used by the Document Analysis Research (DAR) communities to measure segmentation algorithm performance. The second is the NILE database created by sampling from 320 journals pages of east Asian, east European and middle Eastern languages. The result shows that our framework achieves competitive performance in terms of both page frame level segmentation and text-line level segmentation, and is particularly strong at filtering noise. It also shows that our algorithm is more adaptive to language variations.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 31, Issue 11, 1 August 2010, Pages 1260–1273
نویسندگان
, , ,