Article ID Journal Published Year Pages File Type
530549 Pattern Recognition 2013 8 Pages PDF
Abstract

We present a new document image descriptor based on multi-scale runlength histograms. This descriptor does not rely on layout analysis and can be computed efficiently. We show how this descriptor can achieve state-of-the-art results on two very different public datasets in classification and retrieval tasks. Moreover, we show how we can compress and binarize these descriptors to make them suitable for large-scale applications. We can achieve state-of-the-art results in classification using binary descriptors of as few as 16–64 bits.

► We present a document image descriptor based on multi-scale runlength histograms. ► This descriptor does not need layout analysis and can be computed efficiently. ► We compress and binarize the descriptors to make them suitable for large-scale. ► State-of-the-art results on two public datasets.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,