Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
455674 | Computers & Electrical Engineering | 2013 | 12 Pages |
•We introduce a high performance embedded parallel computing architecture with unique memory access.•We adapt the bitonic sort algorithm to the ePUMA architecture.•We propose the in-core bitonic sort and intra-core bitonic sort for the new architecture.
Embedded Parallel computing architecture with Unique Memory Access (ePUMA) is a domain-specific embedded heterogeneous 9-core chip multiprocessor, which has a unique design with low power and high silicon efficiency for high-throughput DSP in emerging telecommunication and multimedia applications. Sorting is one of the most widely studied algorithms, more embedded applications also need efficient sorting. This paper proposes an efficient bitonic sorting algorithm eSORT for the novel ePUMA DSP. eSORT algorithm consists of two parts: an in-core sorting algorithm and an intra-core sorting algorithm. Both algorithms are adapted to the novel architecture and take advantage of the ePUMA platform. This paper implemented and evaluated the eSORT for variable datasets on ePUMA multi-core DSP and compared its performance with the Cell BE processors with the same SIMD parallelization structure. Results show that bitonic sort on ePUMA multi-core DSP has much better performance and scalability. Compared with optimized bitonic sort on Cell BE, the in-core sort is 11 times faster and intra-core sort is 15 times faster in average.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide