Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
462678 | Microprocessors and Microsystems | 2013 | 14 Pages |
A fully programmable vertex shader based on Transport Triggered Architecture (TTA) is proposed in this paper to provide high efficiency of performance and connectivity for embedded applications. At the architecture level, fine-grained data transport in TTA datapath and multi-threading method are adopted to exploit instruction and data level parallelism respectively in the graphics applications. The datapath connectivity can be optimized mainly by native architectural visible bypass in TTA and hybrid result re-collection schemes. At the shader core level, a novel SIMD multi-functional dot-production unit and an area efficient special function unit are introduced for floating-point processing. The proposed processor which achieves peak capacity of 1.5 GFLOPS and 125 Mvertices/s can totally acquire 17.6% reduction in hardware cost and can provide 1.3 times improvement in performance per logic cost ratio under a 0.18 μm CMOS process for real graphics benchmarks compared to previous expanded VLIW vertex processor.