کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
536434 | 870523 | 2013 | 10 صفحه PDF | دانلود رایگان |

In this paper, we propose a new font and size identification method for ultra-low resolution Arabic word images using a stochastic approach. The literature has proved the difficulty for Arabic text recognition systems to treat multi-font and multi-size word images. This is due to the variability induced by some font family, in addition to the inherent difficulties of Arabic writing including cursive representation, overlaps and ligatures. This research work proposes an efficient stochastic approach to tackle the problem of font and size recognition. Our method treats a word image with a fixed-length, overlapping sliding window. Each window is represented with a 102 features whose distribution is captured by Gaussian Mixture Models (GMMs). We present three systems: (1) a font recognition system, (2) a size recognition system and (3) a font and size recognition system. We demonstrate the importance of font identification before recognizing the word images with two multi-font Arabic OCRs (cascading and global). The cascading system is about 23% better than the global multi-font system in terms of word recognition rate on the Arabic Printed Text Image (APTI) database which is freely available to the scientific community.
► We present a study on font-family and font-size recognition in the framework of a priori approach.
► The font and size systems are based on GMMs and applied to Arabic word images at ultra low resolution.
► We show the benefit of font recognition by comparing two HMMs based word recognition systems.
Journal: Pattern Recognition Letters - Volume 34, Issue 2, 15 January 2013, Pages 209–218