کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
433025 689211 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Experience with using the Parallel Workloads Archive
ترجمه فارسی عنوان
تجربه با استفاده از بایگانی موازی بار کار
کلمات کلیدی
ورود بار کار، کیفیت داده، برنامه ریزی کار موازی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• Reliable performance evaluations require reliable data about workloads.
• Workload logs may have data quality problems despite being automatically generated.
• Finding data quality problems is hard and findings must be reported.
• In some cases data quality can be improved e.g. by filtering dubious data.
• Such data analysis and cleaning is an important component of the scientific method.

Science is based upon observation. The scientific study of complex computer systems should therefore be based on observation of how they are used in practice, as opposed to how they are assumed to be used or how they were designed to be used. In particular, detailed workload logs from real computer systems are invaluable for research on performance evaluation and for designing new systems.Regrettably, workload data may suffer from quality issues that might distort the study results, just as scientific observations in other fields may suffer from measurement errors. The cumulative experience with the Parallel Workloads Archive, a repository of job-level usage data from large-scale parallel supercomputers, clusters, and grids, has exposed many such issues. Importantly, these issues were not anticipated when the data was collected, and uncovering them was not trivial. As the data in this archive is used in hundreds of studies, it is necessary to describe and debate procedures that may be used to improve its data quality. Specifically, we consider issues like missing data, inconsistent data, erroneous data, system configuration changes during the logging period, and unrepresentative user behavior. Some of these may be countered by filtering out the problematic data items. In other cases, being cognizant of the problems may affect the decision of which datasets to use. While grounded in the specific domain of parallel jobs, our findings and suggested procedures can also inform similar situations in other domains.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 10, October 2014, Pages 2967–2982
نویسندگان
, , ,