کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
11002421 1440625 2018 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Detecting performance anomalies in scientific workflows using hierarchical temporal memory
ترجمه فارسی عنوان
تشخیص ناهنجاری های عملکرد در جریان های علمی با استفاده از حافظه زمانی سلسله مراتبی
کلمات کلیدی
تشخیص آنومالی آنلاین، جریان کاری علمی، حافظه زمانی سلسله مراتبی، ناهنجاریهای عملکردی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
Technological advances and the emergence of the Internet of Things have lead to the collection of vast amounts of scientific data from increasingly powerful scientific instruments and a growing number of distributed sensors. This has not only exacerbated the significance of the analyses performed by scientific applications but has also increased their complexity and scale. Hence, emerging extreme-scale scientific workflows are becoming widespread and so is the need to efficiently automate their deployment on a variety of platforms such as high performance computers, dedicated clusters, and cloud environments. Performance anomalies can considerably affect the execution of these applications. They may be caused by different factors including failures and resource contention and they may lead to undesired circumstances such as lengthy delays in the workflow runtime or unnecessary costs in cloud environments. As a result, it is essential for modern workflow management systems to enable the early detection of this type of anomalies, to identify their cause, and to formulate and execute actions to mitigate their effects. In this work, we propose the use of Hierarchical Temporal Memory (HTM) to detect performance anomalies on real-time infrastructure metrics collected by continuously monitoring the resource consumption of executing workflow tasks. The framework is capable of processing a stream of measurements in an online and unsupervised manner and is successful in adapting to changes in the underlying statistics of the data. This allows it to be easily deployed on a variety of infrastructure platforms without the need of previously collecting data and training a model. We evaluate our approach by using two real scientific workflows deployed in Microsoft Azure's cloud infrastructure. Our experiment results demonstrate the ability of our model to accurately capture performance anomalies on different resource consumption metrics caused by a variety of competing workloads introduced into the system. A performance comparison of HTM to other online anomaly detection algorithms is also presented, demonstrating the suitability of the chosen algorithm for the problem presented in this work.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 88, November 2018, Pages 624-635
نویسندگان
, , ,