کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432766 689063 2012 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Hierarchical RAID: Design, performance, reliability, and recovery
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Hierarchical RAID: Design, performance, reliability, and recovery
چکیده انگلیسی

Hierarchical RAID (HRAID) extends the RAID paradigm to mask the failure of whole Storage Nodes (SNs)   or bricks, where each SN is a disk array with a certain RAID level. HRAIDk/ℓk/ℓ with NN SNs and MM disks per SN tolerates kk SN failures and ℓℓ disk failures per SN with Maximum Distance Separable (MDS)   erasure codes, which introduce the minimum level of redundancy at each level. For N=MN=M there are kk internode and ℓℓ intranode check strips per SN, occupying the capacity of as many disks with storage redundancy (k+ℓ)/N(k+ℓ)/N, but a higher storage redundancy is required for M>NM>N. HRAIDk/ℓk/ℓ tolerates all disk failures up to dmin=(k+1)(ℓ+1)−1, but up to dmax=Nℓ+Mk−kℓ disk failures can be tolerated. Three options for HRAID operation are: (I) Only intranode recovery. (II) Intranode and internode recovery on demand reconstruction of blocks and rebuild. (III) Multistep internode recovery with no rebuild processing. The I/Os Per Second (IOPS)   metric is used to assess the cost of fault-tolerance for HRAIDk/ℓk/ℓ against RAID(4+ℓ)(4+ℓ) and RAID0, for varying kk and ℓℓ. The maximum IOPS is at its lowest in degraded mode, but even with fewer operational disks the normal mode IOPS may be exceeded after restriping. Asymptotic reliability analysis and simulation results show that HRAIDk/ℓk/ℓ with ℓ>kℓ>k provides a higher reliability when SN failures are due to disk rather than controller failures. Monte Carlo simulation is used to quantify the effect of various recovery options with varying kk and ℓℓ and as the SN controller failure rate is varied with respect to disk failure rates on the Mean Time to Data Loss (MTTDL). The HRAID paradigm is justified by the fact that Options II attains a significantly higher MTTDL than Option I. Option III with no rebuild processing has an MTTDL exceeding Option II, but a poorer performance. 4


► Hierarchical RAID copes with controller failures similarly to disk failures.
► Maximum Distance Separable coding is used in both cases.
► The level of redundancy can be adjusted across disks and nodes.
► Concurrency control for storage transactions.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 72, Issue 12, December 2012, Pages 1753–1769
نویسندگان
, , ,