Article ID Journal Published Year Pages File Type
534915 Pattern Recognition Letters 2008 8 Pages PDF
Abstract

In this paper, we report on the use of characteristic loci features to cluster printed Farsi subwords, based on their holistic shapes. This yields a pictorial dictionary that can be used in a word recognition system to eliminate the search space. The feature vectors are compressed using PCA.The k-means algorithm is used to cluster 113,340 subwords of 4 fonts and 3 sizes to 300 clusters. The minimum and maximum numbers of cluster members are 59 and 876, respectively. The mean of each cluster is used as its entry in the pictorial dictionary.To evaluate the clustering results, a minimum mean-distance classifier was used to test a set of 5000 subwords. 78.71, 99.01 and 100 percent of these subwords were in the first, first five and first 10 closest clusters, respectively.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,