Article ID Journal Published Year Pages File Type
534026 Pattern Recognition Letters 2013 8 Pages PDF
Abstract

•We introduce a novel method to remove false connected components in binary images.•Our method is based on the error rate, voting approach, and binomial distribution.•We tested our method in binarization methods of DIBCO 2011 for historical documents.•The F-measure increases 0.5% and 5.65% for handwritten and printed, respectively.•Our method is easy to implement and has a moderate computational cost.

In this article, we introduce a novel technique to remove binary artifacts. Given a gray-intensity image and its corresponding binary image, our method detects and remove connected components that are more likely to be background pixels. With this aim, our method constructs an auxiliary image by the minimum-error-rate threshold and, then, computes the ratio of intersection between the connected components of the original binary image and the connected components of the auxiliary image. Connected components with high ratio are considered true connected components while the rest are removed from the output. We tested our method in binarization methods for historical documents (handwritten and printed). Our results are favorable and indicate that our method can improve the outputs from diverse binarization methods. In particular, a high improvement was observed for printed documents. Our method is easy to implement, has a moderate computational cost, and has two parameters whose model interpretation allows an easy empirical selection.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , ,