کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523900 868525 2014 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Towards an immortal operating system in virtual environments
ترجمه فارسی عنوان
به سوی یک سیستم عامل جاویدان در محیط های مجازی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We show how a commercial OS can be successfully recovered from a crash.
• Support from the virtualization layer (Hypervisor) can significantly help in diagnosis and recovery of the OS.
• We evaluate the time taken to automatically recover from an OS crash for different workloads.
• This technology can significantly reduce the downtime and maintenance costs in data centers.
• This technology can be easily integrated into the support operations of OS vendors.

Many OS crashes are caused by bugs in kernel extensions or device drivers while the OS itself may have been tested rigorously. To make an OS immortal we must resurrect the OS from these crashes. We present a novel OS-hypervisor infrastructure that allows automated and transparent OS crash diagnosis and recovery in a virtual environment. This infrastructure eliminates the need for reboots or checkpoint-restart mechanisms, which require preserving the states of critical applications before the crash happens and also require extensive modifications to those applications. At the core of our approach is a small hidden OS-repair-image that is dynamically created from the healthy running OS instance. When an OS crashes, the hypervisor dynamically loads this repair-image to perform diagnosis and repair. One way of repair we have experimented with, is to quarantine the offending process and resume the running of the fixed OS automatically without a reboot. Experimental evaluations demonstrated that it takes less than 3 s to recover from an OS crash. This approach can significantly reduce the downtime and maintenance costs in data centers, and is the first design and implementation of an OS-hypervisor combo capable of automatically resurrecting a crashed commercial server-OS. In addition to online diagnosis and recovery, this infrastructure can also be used for offline diagnosis and can be incorporated into the technical support tools of the OS vendor. Additionally, we have used parts of this infrastructure to speed-up the diagnosis of AIX OS-crashes for the IBM technical support teams.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 40, Issue 9, October 2014, Pages 526–535
نویسندگان
, , , ,