کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
410104 679124 2013 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A dual process redundancy approach to transient fault tolerance for ccNUMA architecture
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A dual process redundancy approach to transient fault tolerance for ccNUMA architecture
چکیده انگلیسی

Transient fault is a critical concern in the reliability of microprocessor system. The software fault tolerance is more flexible and lower in cost than the hardware fault tolerance. And also, as architectural trends point toward multicore designs, there is substantial interest in adapting parallel and redundancy hardware resources for transient fault tolerance. The paper proposes a process-level fault tolerance technique, a software-centric approach, which efficiently schedules and synchronizes redundancy processes with ccNUMA processors redundancy. So it can improve efficiency of redundancy processes running and reduce time and space overhead. The paper focuses on the researching of redundancy processes error detection and handling method. A real prototype is implemented that is designed to be transparent to the application. The test results show that the system can timely detect soft errors of CPU and memory that cause the redundancy processes exception, and meanwhile ensure that the services of the application are uninterrupted and delayed shortly.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 122, 25 December 2013, Pages 50–57
نویسندگان
, , , , , ,