کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6875115 | 689011 | 2016 | 8 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
A software scheduling solution to avoid corrupted units on GPUs
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
چکیده انگلیسی
Massively parallel processors provide high computing performance by increasing the number of concurrent execution units. Moreover, the transistor technology evolves to higher density, higher frequency and lower voltage. The combination of these factors increases significantly the probability of hardware failures. In this paper, we present a methodology to locate and mitigate hardware failures of Nvidia GPUs. Results show that intermittent errors can be precisely localized and have a limited impact to a well defined architecture tile. Therefore, we propose, and demonstrate on a software prototype, a rescheduling strategy to quarantine the defective hardware and ensure correct execution. Our approach significantly improves the GPU fault-tolerance capability and GPU's lifespan, at a reasonable overhead.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volumes 90â91, April 2016, Pages 1-8
Journal: Journal of Parallel and Distributed Computing - Volumes 90â91, April 2016, Pages 1-8
نویسندگان
David Defour, Eric Petit,