کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1179460 1491546 2014 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
MLRMPA: An R package of multiple linear regression model population analysis based on a cluster sampling technique for variable selection of high dimensional data
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی آنالیزی یا شیمی تجزیه
پیش نمایش صفحه اول مقاله
MLRMPA: An R package of multiple linear regression model population analysis based on a cluster sampling technique for variable selection of high dimensional data
چکیده انگلیسی


• A new R package MLRMPA is introduced to perform VarCor-MLRMPA algorithm.
• The VarCor-MLRMPA method contains in-cluster sampling and model building.
• In-clustering sample is to sample descriptors by MCS from clustered descriptors.
• MLRMPA modeling is to build a stepwise linear model using the sampled features.
• The VarCor-MLRMPA algorithm is used to predict the response and detect outliers.

We develop an R package MLRMPA for fitting a pool of models between response variable and chemical descriptors. It is an embedded method combining feature selection with model building. The feature selection procedure is a cluster sampling method and different from model population analysis (MPA) that was implemented in a previously published study. The modeling process performs multiple stepwise regression analysis using the sampled features from the clustered group. This paper provides the algorithm and method implemented in the R package, which includes VarCor feature selection, cluster sampling, model building and model checking. This package is applied to establish an optimal linear model to predict the response and detect outliers from sub-optimal models.

The graphical abstract shows the workflow of the VarCor-MLRMPA method. The kernel of the idea: clustering all pre-selected descriptors into groups and randomly selecting one variable from the clustered descriptors by Monte Carlo sampling technique to make up a subset of variables for building a multiple stepwise linear model. With N times sampling, N models can be obtained from where one can extract an optimal model.Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Chemometrics and Intelligent Laboratory Systems - Volume 132, 15 March 2014, Pages 124–132
نویسندگان
, , , , , ,