Article ID Journal Published Year Pages File Type
6939972 Pattern Recognition 2016 13 Pages PDF
Abstract
The task of text/non-text classification in online handwritten documents is crucially important to text recognition, text search, and diagram interpretation. It, however, is a challenging problem because of the large amount of variation and lack of prior knowledge. In order to solve this problem, we propose to use global and local contexts to build a high-performance classifier. The classifier assigns a text or non-text label to each stroke in a stroke sequence of a digital ink document. First, a neural network architecture is used to acquire the complete global context of the sequence of strokes. Then, a simple but effective model based on a marginal distribution is used for the local temporal context of adjacent strokes in order to improve the sequence labeling result. The results of experiments on available heterogeneous online handwritten document databases demonstrate the superiority and effectiveness of our context combination approach. Our method achieved classification rates of 99.04% and 98.30% on the Kondate (written in Japanese) and IAMonDo (written in English) heterogeneous document databases. These results are significantly better than others reported in the literature.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,