Tensor representation learning based image patch analysis for text identification and recognition

Article ID	Journal	Published Year	Pages	File Type
532028	Pattern Recognition	2015	14 Pages	PDF

Abstract

•A novel model, tensor representation learning based image patch analysis (TRL-IPA), is proposed for document understanding.•TRL-IPA is built on a general formulation of the convergent tensor representation learning (CTRL) algorithms.•The CTRL algorithms are theoretically guaranteed to converge to a local optimal solution of the learning problem.•Extensive experiments demonstrate the superiority of TRL-IPA over related vector and tensor representation based approaches.

In this paper, we introduce a novel framework for text identification and recognition, called tensor representation learning based image patch analysis (TRL-IPA). Unlike most of previous text identification approaches, which can only be applied to binarized images, TRL-IPA can be directly applied to gray level and color images. TRL-IPA is built on a general formulation of the convergent tensor representation learning (CTRL) algorithms. In the implementation of TRL-IPA, image patches are represented in the form of tensors, while low dimensional representations of these tensors are learned via a CTRL algorithm. To identify text regions in new coming document images, a random forest classifier is trained in the learned tensor subspace. Moreover, the TRL-IPA framework can be straightforwardly applied to recognition problems, such as handwritten digits recognition. We conducted extensive experiments on ancient Chinese, Arabic and Cyrillic document images, to evaluate TRL-IPA on text identification tasks. Experimental results demonstrate its effectiveness and robustness. In addition, recognition results on images of handwritten digits show its advantage over state-of-the-art vector and tensor representation based approaches.

Keywords

Text recognition Convergence