Article ID Journal Published Year Pages File Type
4947770 Neurocomputing 2017 23 Pages PDF
Abstract
Document image binarization refers to the conversion of a document image into a binary image. For broken and severely degraded document images, binarization is a very challenging process. Unlike the traditional methods that separate the foreground from the background, this paper presents a new framework for the binarization of broken and degraded document images and restoring the quality of the document images. In our approach, the non-local means method is extended and used to remove noises from the input document image in the step of pre-process. Then the proposed method binarizes the document image which takes advantage of the quick adaptive thresholding proposed by Pierre D. Wellner. To get more pleasing binarization results, the binarized document image is post-processed finally. There are three measures in the post-process step: de-speckle, preserve stroke connectivity and improve quality of text regions. Experimental results show significant improvement in the binarization of the broken and degraded document images collected from various sources including degraded and broken books, magazines and document files.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,