کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523951 868534 2013 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP
چکیده انگلیسی


• A simple 8-way TTA coprocessor with cost-effective SIMD functional units can efficiently exploit ILP and DLP.
• Long vector capabilities built upon existing SIMD hardware achieves better cost-performance tradeoff.
• Unified scalar and vector approach eliminates the transformation overhead between them.
• The implicit data permutation mechanism can address the conventional permutation limitations of SIMD architecture.

Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a multimedia coprocessor resembles of single-instruction multiple-data (SIMD) engines into architectures exploiting ILP at compile time, such as very long instruction word (VLIW) and transport triggered architecture (TTA). However, the ILP regions fail to scale with the increased vector length to achieve high performance in the DLP regions. Furthermore, the register-to-register nature of SIMD instructions causes current SIMD engines to have limitations in handling memory alignment, data reorganization, and control flow. Many supporting instructions such as data permutations, address generations, and loop branches, are required to aid in the execution of the real SIMD computation instructions. To mitigate these problems, we propose optimized SIMD engines that have the capabilities for combining VLIW or TTA processing with a unified scalar and long vector computations as well as efficient SIMD hardware for real computation. Our new architecture is based on TTA and is called multimedia coprocessor (MCP). This architecture includes following features: (1) a simple coprocessor structure with 8-way TTA, (2) cost-effective SIMD hardware capable of performing floating-point operations, (3) long vector capabilities built upon existing SIMD hardware and a single register file and processor data path for both scalar operands and vector elements, and (4) an optimized SIMD architecture that addresses the SIMD limitations. Our experimental evaluations show that MCP can outperform conventional SIMD techniques by an average of 39% and 12% in performance for multimedia kernels and applications, respectively.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 39, Issue 10, October 2013, Pages 586–602
نویسندگان
, , , , ,