Classification of Post-deployment Performance Diagnostic Techniques for Large-scale Software Systems

Article ID	Journal	Published Year	Pages	File Type
487644	Procedia Computer Science	2014	8 Pages	PDF

Abstract

Today's large-scale software systems (LSSs) such as Facebook, Google, Amazon and many other contemporary datacenters comprise hundreds or thousands of machines running complex applications that require high availability and responsiveness. These LSSs must be carefully monitored for performance bottlenecks before a serious harm is done. Performance analysts have to deal with the tedious task of monitoring the performance of these LSSs to avoid any service level agreements (SLA) violations and to ensure their failure free operations. There do exist several post-deployment performance diagnostic (PPD) techniques for to help analysts diagnose performance problems in the field, i.e., after the software is deployed. However, there is no classification of the proposed PPD techniques to understand their objectives and characteristics. In this paper, we classify the existing PPD techniques along multiple categories. The classification of PPD techniques will provide a guideline for performance analysts and practitioners of LSS to choose techniques suitable for their need. Moreover, the classification will also help researcher understand and fill gaps, i.e., dedicate their research efforts to categories that have received little attention in the past.