Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
397233 | Information Systems | 2016 | 21 Pages |
•We developed a memory-constrained hash join suitable for column-store main-memory databases.•We utilize a block nested loops approach, performing a hash join between blocks that uses a compact hash table to lower the memory footprint.•We tackle the challenges of parallelizing the join algorithm while being mindful of operating within a tight memory constraint.•In scenarios without memory constraints our solution performs competitively with other contemporary non-hardware tuned join algorithms.•In memory-constrained scenarios, our solution is up to 3 times faster than another high-performance memory-constrained hash join.
This paper presents a memory-constrained hash join algorithm (PaMeCo Join) designed to operate with main-memory column-store database systems. Whilst RAM has become more affordable and the popularity of main-memory database systems continues to grow, we recognize that RAM is a finite resource and that database systems rarely have an excess of memory available to them. Therefore, we design PaMeCo to operate within an arbitrary memory limitation by processing the input relations by parts, and by using a compact hash table that represents the contained tuples in a compact format. Coupled with a radix-clustering system that lowers memory latencies, we find that PaMeCo can offer competitive performance levels to other contemporary hash join algorithms in an unconstrained environment, while being up to three times faster than a high-performing hash join when memory constraints are applied.