Article ID Journal Published Year Pages File Type
425628 Future Generation Computer Systems 2015 13 Pages PDF
Abstract

•A PWLM3PWLM3-based automatic performance model estimation method is proposed.•A model base is built to standardize the model representation of submodels.•A cluster quality assessment strategy is used to evaluate the number of submodels.•A submodel selection strategy is applied to build performance model candidate set.

There is a growing need for the development of an automatic performance model estimation method for Hadoop Distributed File System (HDFS) write and read (W/R) operations in order to deal with constant software improvement and updates, parameter configuration changes, hardware heterogeneity, and their Quality of Service (QoS) evaluation. Extant research based on single linear system model has a limited ability to explain the performance variations due to changes in HDFS parameters such as block size. These variations reveal some typical characteristics of nonlinear systems and are an obstacle in achieving effective automatic performance estimation. In order to deal with this challenge, a piecewise-linear multi-model modeling (PWLM33)-based automatic performance model estimation method is proposed for HDFS W/R performance. In the proposed method, a standard model base is built to standardize the model representation of every submodel. Moreover, a cluster quality assessment strategy is applied to evaluate the optimal number of submodels, and a submodel selection strategy is implemented to construct performance model candidates and improve the computation efficiency of the proposed method. In addition, Levenberg–Marquardt (LM) and Universal Global Optimization (UGO) algorithms are adopted to estimate the values of switch points and identify undetermined parameters of performance model candidates. Then the performance model is selected among these candidates according to Root Mean Squared Error (RMSE) indicator. Experimental results demonstrate that the PWLM33-based performance model provides a good understanding and description of nonlinear characteristics of HDFS W/R performance and achieves better identification precision than a single linear system model-based one.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , ,