کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
530688 | 869782 | 2014 | 10 صفحه PDF | دانلود رایگان |
• The Arabic word descriptor (AWD) is introduced for lexicon reduction and shape indexing.
• The AWD seamlessly integrates shape, subwords and diacritics information.
• The lexicon reduction system has been combined with two word classifiers.
• Experiments were performed on the IFN/ENIT and Ibn Sina databases.
Word recognition systems use a lexicon to guide the recognition process in order to improve the recognition rate. However, as the lexicon grows, the computation time increases. In this paper, we present the Arabic word descriptor (AWD) for Arabic word shape indexing and lexicon reduction in handwritten documents. It is formed in two stages. First, the structural descriptor (SD) is computed for each connected component (CC) of the word image. It describes the CC shape using the bag-of-words model, where each visual word represents a different local shape structure, extracted from the image with filters of different patterns and scales. Then, the AWD is formed by sorting and normalizing the SDs. This emphasizes the symbolic features of Arabic words, such as subwords and diacritics, without performing layout segmentation. In the context of lexicon reduction, the AWD is used to index a reference database. Given a query image, the reduced lexicon is obtained from the labels of the first entries in the indexed database. This framework has been tested on Arabic word databases. It has a low computational overhead, while providing a compact descriptor, with state-of-the-art results for lexicon reduction on the Ibn Sina and IFN/ENIT databases.
Journal: Pattern Recognition - Volume 47, Issue 10, October 2014, Pages 3477–3486