Article ID Journal Published Year Pages File Type
4942785 Engineering Applications of Artificial Intelligence 2017 13 Pages PDF
Abstract
This paper proposes a document indexing method for keyword spotting based on semi-Markov conditional random fields (semi-CRFs), which provide a theoretical framework for fusing the information of different contexts. The candidate segmentation-recognition lattice is first augmented based on the linguistic context to improve recognition results. For fast retrieval and to save storage space, the lattice is then purged by a forward-backward pruning procedure. In the reduced lattice, we estimate character similarity scores based on the semi-CRF model. The parameters of semi-CRF model are estimated using a binary classification objective, i.e., the cross-entropy (CE) to discriminate candidate characters in the lattice. To locate mis-recognized character instances in the lattice, we use confusing similar characters as proxies and search for proxy-characters in the index file. The proxy-character driven search can significantly improve the performance compared with our previous character-synchronous dynamic search (CSDS) method. Experimental results on the online handwriting database CASIA-OLHWDB justify the effectiveness of the proposed method.
Keywords
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,