A statistical–topological feature combination for recognition of handwritten numerals

Article ID	Journal	Published Year	Pages	File Type
496372	Applied Soft Computing	2012	10 Pages	PDF

Abstract

Principal Component Analysis (PCA) and Modular PCA (MPCA) are well known statistical methods for recognition of facial images. But only PCA/MPCA is found to be insufficient to achieve high classification accuracy required for handwritten character recognition application. This is due to the shortcomings of those methods to represent certain local morphometric information present in the character patterns. On the other hand Quad-tree based hierarchically derived Longest-Run (QTLR) features, a type of popularly used topological features for character recognition, miss some global statistical information of the characters. In this paper, we have introduced a new combination of PCA/MPCA and QTLR features for OCR of handwritten numerals. The performance of the designed feature-combination is evaluated on handwritten numerals of five popular scripts of Indian sub-continent, viz., Arabic, Bangla, Devanagari, Latin and Telugu with Support Vector Machine (SVM) based classifier. From the results it has been observed that MPCA + QTLR feature combination outperforms PCA + QTLR feature combination and most other conventional features available in the literature.

Graphical abstractPrincipal Component Analysis (PCA) and Modular PCA (MPCA), two well-known statistical methods for many pattern recognition applications are found to be inadequate to achieve high classification accuracy for handwritten numeral recognition application. This is due to the shortcomings of those methods to represent certain local morphometric information present in the character patterns. On the other hand, hierarchically derived Quad-Tree based Longest-Run (QTLR) features, a popularly used topological feature descriptor, often fail to extract enough global statistical information from the character patterns. To bridge this gap, a combination of PCA/MPCA and QTLR features is used here for recognition of handwritten numerals of five popular scripts of Indian sub-continent, viz., Arabic, Bangla, Devanagari, Latin and Telugu with Support Vector Machine (SVM).Figure optionsDownload full-size imageDownload as PowerPoint slideHighlights► We have developed a statistical–topological feature Combination for recognition of Arabic, Bangla, Devanagari, Latin and Telugu numerals. ► PCA/MPCA based statistical features are used here. ► QTLR features are used as topological features for the current experiment to represent local morphometric information for a digit image. ► We have evaluated the performances of the combined feature set using SVM classifier with RBF kernel. ► Maximum recognition accuracies are found as 98.40%, 98.55%, 98.70%, 98.90% and 99.10% over test datasets of Arabic, Bangla, Devanagari, Latin and Telugu numerals respectively.

Keywords

MPCA PCA Statistical Feature combination Topological character recognition SVM