Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4947770 | Neurocomputing | 2017 | 23 Pages |
Abstract
Document image binarization refers to the conversion of a document image into a binary image. For broken and severely degraded document images, binarization is a very challenging process. Unlike the traditional methods that separate the foreground from the background, this paper presents a new framework for the binarization of broken and degraded document images and restoring the quality of the document images. In our approach, the non-local means method is extended and used to remove noises from the input document image in the step of pre-process. Then the proposed method binarizes the document image which takes advantage of the quick adaptive thresholding proposed by Pierre D. Wellner. To get more pleasing binarization results, the binarized document image is post-processed finally. There are three measures in the post-process step: de-speckle, preserve stroke connectivity and improve quality of text regions. Experimental results show significant improvement in the binarization of the broken and degraded document images collected from various sources including degraded and broken books, magazines and document files.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Yiping Chen, Liansheng Wang,