کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
534103 870216 2012 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Modeling broken characters recognition as a set-partitioning problem
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Modeling broken characters recognition as a set-partitioning problem
چکیده انگلیسی

This paper presents a novel technique for recognizing broken characters found in degraded text documents by modeling it as a set-partitioning problem (SPP). The proposed technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm that we call Heuristic Incremental Integer Programming (HIIP). The algorithm employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. The objective function is formulated as probability functions that reflect common OCR measurements – pattern resemblance, sizing conformity and distance between connected components. We applied the HIIP technique to Thai and English degraded text documents and achieved accuracy rates over 90%. We also compared HIIP against three competing algorithms and achieved higher comparative accuracy in each case.


► Recognize both vertically and horizontally broken characters.
► Extraction and recognition results are over 90%.
► Applied to Thai and English documents.
► Algorithm: Heuristic Incremental Integer Programming (HIIP).
► Need to only adjust probability functions to reflect specific language/scripts.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 33, Issue 16, 1 December 2012, Pages 2270–2279
نویسندگان
, ,