کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4969580 1449974 2018 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Comparative study of conventional time series matching techniques for word spotting
ترجمه فارسی عنوان
بررسی تطبیقی ​​تکنیک های مطابق با مجموعه های زمان بندی معمول برای لکه گذاری کلمه
کلمات کلیدی
علامت گذاری به کلمه اسناد تاریخی منحرف شده، اسناد دست نوشته شده، مجموعه داده جورج واشنگتن، مجموعه داده بنتام، شناخت دست خط ژاپنی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


- Experimented 32 sequence matching techniques by following a simple and classical word spotting architecture.
- Many such techniques have never been experimented in the context of word spotting but shows interesting word spotting results.
- Each sequence matching technique is explained in a detailed manner to quickly understand the idea behind.
- Six historical datasets of different kinds (handwritten and printed) are experimented.
- Experimental results are explained, analyzed and important conclusions are drawn on which algorithms to be used in a given context.

In word spotting literature, many approaches have considered word images as temporal signals that could be matched by classical Dynamic Time Warping algorithm. Consequently, DTW has been widely used as a on the shelf tool. However there exists many other improved versions of DTW, along with other robust sequence matching techniques. Very few of them have been studied extensively in the context of word spotting whereas it has been well explored in other application domains such as speech processing, data mining etc. The motivation of this paper is to investigate such area in order to extract significant and useful information for users of such techniques. More precisely, this paper has presented a comparative study of classical Dynamic Time Warping (DTW) technique and many of its improved modifications, as well as other sequence matching techniques in the context of word spotting, considering both theoretical properties as well as experimental ones. The experimental study is performed on historical documents, both handwritten and printed, at word or line segmentation level and with a limited or extended set of queries. The comparative analysis is showing that classical DTW remains a good choice when there is no segmentation problems for word extraction. Its constrained version (e.g. Itakura Parallelogram) seems better on handwritten data, as well as Hilbert transform also shows promising performances on handwritten and printed datasets. In case of printed data and low level features (pixel's column based), the aggregation of features (e.g. Piecewise-DTW) seems also very important. Finally, when there are important word segmentation errors or when we are considering line segmentation level, Continuous Dynamic Programming (CDP) seems to be the best choice.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 73, January 2018, Pages 47-64
نویسندگان
, , , ,