Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4943141 | Expert Systems with Applications | 2017 | 39 Pages |
Abstract
One of the difficulties in the understanding of document images is document layout analysis, which is the first step in document image modeling. In this paper, a robust system for which a multilevel-homogeneity structure is used in accordance with a hybrid methodology is proposed to deal with this problem. Our system consists of the following three main stages: classification, segmentation, and refinement and labeling. Different from other page segmentation methods, the proposed system includes an efficient algorithm to detect table regions in document images. Besides, to create an effective application, the proposed system is designed to work with a variety of document languages. The proposed method was tested with the ICDAR2015 competition (RDCL-2015) and three other published datasets in different languages. The results of these tests show that the accuracy of proposed system is superior to the previous methods.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Tuan Anh Tran, Kanghan Oh, In-Seop Na, Guee-Sang Lee, Hyung-Jeong Yang, Soo-Hyung Kim,