Article ID Journal Published Year Pages File Type
571031 Procedia Computer Science 2016 9 Pages PDF
Abstract

Text/Image region separation is the process of identifying location of various text and image regions in a scanned document image. This is particularly helpful in detecting the layout of a scanned document image. The text region thus obtained can be used for optical character recognition (OCR) operation. The text region can be used to label and train automatic layout learning system to detect locations of title, keywords, subheadings, paragraphs, image locations etc. In case of regular image and text boundaries, Profiling or morphological operations can be used for separating the text and image regions and to achieve correct document layout out detection. However, the real-world documents will have irregular boundaries and noise, the usual profile based methods and its heuristic often fails. This will lead to incorrect document layouts. This paper proposes to use edge enhancement diffusion and level set method for text/image region separation from scanned document images. The result obtained shows that the proposed method works when the document contain multiple images. The proposed method detects the layout of the scanned document even when the image and the text regions have irregular shape.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, , , ,