Sorting on GPUs for large scale datasets: A thorough comparison

Article ID	Journal	Published Year	Pages	File Type
515641	Information Processing & Management	2012	15 Pages	PDF

Abstract

Although sort has been extensively studied in many research works, it still remains a challenge in particular if we consider the implications of novel processor technologies such as manycores (i.e. GPUs, Cell/BE, multicore, etc.). In this paper, we compare different algorithms for sorting integers on stream multiprocessors and we discuss their viability on large datasets (such as those managed by search engines). In order to fully exploit the potentiality of the underlying architecture, we designed an optimized version of sorting network in the K-model, a novel computational model designed to consider all the important features of many-core architectures. According to K-model, our bitonic sorting network mapping improves the three main aspects of many-core architectures, i.e. the processors exploitation, and the on-chip/off-chip memory bandwidth utilization. Furthermore we are able to attain a space complexity of Θ(1). We experimentally compare our solution with state-of-the-art ones (namely, Quicksort and Radixsort) on GPUs. We also compute the complexity in the K-model for such algorithms. The conducted evaluation highlight that our bitonic sorting network is faster than Quicksort and slightly slower than radix, yet being an in-place solution it consumes less memory than both algorithms.

Research highlights► Inspired by the attractive performance/cost ratio, several studies in diferent application fields adopt graphic processors for carrying out data-intensive and general-purpose tasks. We digress from the motivation of sorting effciently a large amount of data on modern GPUs to propose a novel sorting solution that is able to sort in-place an array of integers. We also present this novel data partitioning schema to map bitonic sorting network (BSN) on GPU fully exploiting its high bandwidth memory interface and its massive computing power. ► Exploiting the performance constraints of a novel computational model proposed by us, i.e. K-model, we design a method to improve the theoretical performance of sorting using butterffy networks. Our theoretical evaluation shows that following the guidelines of the method proposed improve the performance of bitonic sorting networks also outperforming other algorithms. As side result, this also provides an example showing that previous computational models are not suitable for modeling modern GPU architecture. ► We perform a detailed experimental evaluation of state-of-the-art techniques on GPU sorting and we compare them on diferent synthetic datasets of diferent size and generated with diferent distribution. After we present experiments showing the performance of a Blocked Sort-Based Indexer, modified with a GPU-based sort step, to index six diferent collections of Web documents. Results show the benefits of adopting in-place sorting solutions on large datasets.

Keywords

Computational model