Article ID Journal Published Year Pages File Type
487495 Procedia Computer Science 2015 10 Pages PDF
Abstract

In this paper a novel HNSI (Handwritten Numeral Script Identification) framework to identify scripts from document images containing numeral text written by any one of the four popular Indic scripts namely Bangla, Devanagari, Roman and Urdu has been proposed. A dataset of 4000 word-level numeral images with equal distribution of each script type are collected from different individuals with varying age, sex and educational qualification. Some spatial and frequency domain features has been computed and a 55-dimensional feature vector is developed. During experimentation the whole dataset is divided into 2:1 ratio for training and testing. Performance of different classifiers is compared and MLP is found to be the best one while evaluating accuracy rate for combinations like Four-scripts, Tri-scripts and Bi-scripts.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)