کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
493830 722916 2013 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Maximizing the detection probability of overheating server components with sensor placement based on thermal dynamics
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Maximizing the detection probability of overheating server components with sensor placement based on thermal dynamics
چکیده انگلیسی

Server overheating has become a well-known issue in today's data centers that host a large number of high-density servers. The current practice of server overheating detection is to monitor the server inlet temperature with the temperature sensor on the server enclosure, or the CPU temperature with on-die thermal sensors. However, this is in contrast to the fact that different components in a server may have different overheating thresholds, which are closely related to their respective thermal failure rates and expected lifetimes. Moreover, the thermal correlation between the inlet (or CPU) and other server components can be different for every server model. As a result, relying on the single inlet or CPU temperature for server overheating detection is over-simplistic, which may lead to either degraded detection performance or false alarms that can result in excessive cooling power, leading to unnecessarily low inlet temperature.In this paper, we propose a model-based approach that leverages thermal dynamics to intelligently choose sensor placement locations for precise overheating server component detection. We first formulate the detection problem as a constrained optimization problem. We then adopt Computational Fluid Dynamics (CFD) to establish the thermal model and analyze the thermal status of the server enclosure under various overheating conditions, such as inlet overheating, fan failures and CPU overloading. Based on the CFD analysis, we apply data fusion and advanced optimization techniques to find a near-optimal solution for sensor placement locations, such that the probability of detecting different overheating components is significantly improved. Our empirical results on a real rack server testbed demonstrate the detection performance of our solution. Extensive simulation results also show that the proposed solution outperforms other commonly used overheating monitoring solutions in terms of detection probability and error rate.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Sustainable Computing: Informatics and Systems - Volume 3, Issue 3, September 2013, Pages 148–160
نویسندگان
, , , ,