Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
10322280 | Expert Systems with Applications | 2015 | 12 Pages |
Abstract
The approach is based on the estimation of an expected error vs. transformation cost distribution. First, a model predicting the probability of a cost to arise from an erroneously transcribed string is computed from a sample of supervised OCR hypotheses. Then, given a test sample, a cumulative error vs. cost curve is computed and used to automatically set the appropriate threshold that meets the user-defined error rate on the overall sample. The results of experiments on batches coming from different writing styles show very accurate error rate estimations where fixed thresholding clearly fails. An original procedure to generate distorted strings from a given language is also proposed and tested, which allows the use of the presented method in tasks where no real supervised OCR hypotheses are available to train the system.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
J. Ramon Navarro-Cerdan, Joaquim Arlandis, Rafael Llobet, Juan-Carlos Perez-Cortes,