کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523885 868518 2015 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Combined hardware–software multi-parallel prefiltering on the Convey HC-1 for fast homology detection
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Combined hardware–software multi-parallel prefiltering on the Convey HC-1 for fast homology detection
چکیده انگلیسی


• We fully leveraged the hybrid system Convey-HC-1 to execute the tool HHblits.
• Our hardware-based design supports and accelerates prefiltering in HHblits.
• We developed a highly parallel hardware design for prefiltering a protein database.
• We achieved further speedup through a wisely workload-balanced approach.

Protein databases used in research are huge and still grow at a fast pace. Many comparisons need to be done when searching similar (homologous) sequences for a given query sequence in these databases. Comparing a query sequence against all sequences of a huge database using the well-known Smith–Waterman algorithm is very time-consuming. Hidden Markov Models pose an opportunity for reducing the number of entries of a database and also enable to find distantly homologous sequences. Fewer entries are achieved by clustering similar sequences in a Hidden Markov Model. Such an approach is used by the bioinformatics tool HHblits. To further reduce the runtime, HHblits uses two-level prefiltering to reduce the number of time-consuming Viterbi comparisons. Still, prefiltering is very time-consuming. Highly parallel architectures and huge bandwidth are required for processing and transferring the massive amounts of data. In this article, we present an approach exploiting the reconfigurable, hybrid computer architecture Convey HC-1 for migrating the most time-consuming part. The Convey HC-1 with four FPGAs and high memory bandwidth of up to 76.8 GB/s serves as the platform of choice. Other bioinformatics applications have already been successfully supported by the HC-1. Limited by FPGA size only, we present a design that calculates four first-level prefiltering scores per FPGA concurrently, i.e. 16 calculations in total. This score calculation for the query profile against database sequences is done by a modified Smith–Waterman scheme that is internally parallelized 128 times in contrast to the original Streaming ‘Single Instruction Multiple Data (SIMD)’ Extensions (SSE)-supported implementation where only 16-fold parallelism can be exploited and where memory bandwidth poses the limiting factor. Preloading the query profile, we are able to transform the memory-bound implementation to a compute- and resource-bound FPGA design. We tightly integrated the FPGA-based coprocessor into the hybrid computing system by employing task-parallelism for the two-level prefiltering. Despite much lower clock rates, the FPGAs outperform SSE-based execution for the calculation of the prefiltering scores by a factor of 7.9.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 42, February 2015, Pages 4–17
نویسندگان
, , ,