کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523839 868503 2016 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi
چکیده انگلیسی


• Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.
• A tiled approach to accelerating the Wu–Manber algorithm on GPUs has been proposed.
• Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.
• The GPU-based version with tiling runs up to 2.9 × faster than the Xeon Phi version.

Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typically encodes non-deterministic finite automaton (NFA) states or value differences between adjacent cells of a dynamic programming matrix in the form of bit arrays. Wu–Manber (WM) is a well-known bit-parallel APM algorithm, which simulates an NFA and gains parallel efficiency by performing multiple state updates within a machine word. An important parameter is the machine word size (e.g. 32 or 64 bits for CPUs). Due to increasing vector capabilities, efficient mapping of bit-parallel APM algorithms onto modern high performance computing architectures is an interesting research topic. Prominent examples are Xeon Phi coprocessors and CUDA-enabled GPUs, which provide words of size 512 bits (by means of vector registers) and 1024 bits (by means of warps), respectively. In this paper, we investigate mappings of the WM algorithm onto these two accelerator types. Both architectures are able to achieve around two orders-of-magnitude speedups compared to a single-threaded CPU implementation. Moreover, our tile-based implementation on a GeForce Titan graphics card runs up to 2.9 × faster than our implementation on an Intel Xeon Phi 5110P. Source code is available at http://xbitpar.sourceforge.net.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 54, May 2016, Pages 128–138
نویسندگان
, , ,