Article ID Journal Published Year Pages File Type
536422 Pattern Recognition Letters 2013 10 Pages PDF
Abstract

Scene text detection could be formulated as a bi-label (text and non-text regions) segmentation problem. However, due to the high degree of intraclass variation of scene characters as well as the limited number of training samples, single information source or classifier is not enough to segment text from non-text background. Thus, in this paper, we propose a novel scene text detection approach using graph model built upon Maximally Stable Extremal Regions (MSERs) to incorporate various information sources into one framework. Concretely, after detecting MSERs in the original image, an irregular graph whose nodes are MSERs, is constructed to label MSERs as text regions or non-text ones. Carefully designed features contribute to the unary potential to assess the individual penalties for labeling a MSER node as text or non-text, and color and geometric features are used to define the pairwise potential to punish the likely discontinuities. By minimizing the cost function via graph cut algorithm, different information carried by the cost function could be optimally balanced to get the final MSERs labeling result. The proposed method is naturally context-relevant and scale-insensitive. Experimental results on the ICDAR 2011 competition dataset show that the proposed approach outperforms state-of-the-art methods both in recall and precision.

► MSERs are proved to be suitable for text detection in the experiment. ► Carefully designed effective features are used to train the basic MSERs classifier. ► MSERs-based graph mode is built to incorporate region and context information. ► MSERs are labeled as text or non-text by minimizing the cost function via graph cut.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , ,