کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432132 688718 2008 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A two-stage hardware scheduler combining greedy and optimal scheduling
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
A two-stage hardware scheduler combining greedy and optimal scheduling
چکیده انگلیسی

Greedy scheduling heuristics provide a low complexity and scalable albeit particularly sub-optimal strategy for hardware-based crossbar schedulers. In contrast, the maximum matching algorithm for Bipartite graphs can be used to provide optimal scheduling for crossbar-based interconnection networks with a significant complexity and scalability cost. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By leveraging the inherent parallelism available in custom hardware design, we reformulate maximum matching in terms of Boolean operations rather than matrix computations and introduce three maximum matching implementations in hardware. Specifically, we examine a Pure Logic Scheduler with three dimensions of parallelism, a Matrix Scheduler with two dimensions of parallelism and a Vector Scheduler with one dimension of parallelism. These designs reduce the algorithmic complexity for an N×NN×N network from O(N3)O(N3) to O(1)O(1), O(K)O(K), and O(KN)O(KN), respectively, where KK is the number of optimization steps. While an optimal scheduling algorithm requires K=2N−1K=2N−1 steps, by starting with our hardware-based greedy strategy to generate an initial schedule, our simulation results show that the maximum matching scheduler can achieve 99% of the optimal schedule when K=9K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024N=1024. Using FPGA synthesis results, we show that a greedy schedule for crossbars, ranging from 8×8 to 256×256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024×1024 the scheduling can be completed in approximately 10 μs with current technology and could reach under 90 ns with future technologies.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 68, Issue 11, November 2008, Pages 1437–1451
نویسندگان
, , ,