کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10140733 1646045 2018 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Effects of data pre-processing methods on classification of ATR-FTIR spectra of pen inks using partial least squares-discriminant analysis (PLS-DA)
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی آنالیزی یا شیمی تجزیه
پیش نمایش صفحه اول مقاله
Effects of data pre-processing methods on classification of ATR-FTIR spectra of pen inks using partial least squares-discriminant analysis (PLS-DA)
چکیده انگلیسی
In response to our review paper [L.C. Lee et al., Chemom. Intell. Lab. Systs. 163 (2017) 64-75], we present a study that explores practical impacts of data preprocessing (DP) methods in ATR-FTIR spectra. Nine common DP methods, i.e. mean centering (MC), autoscaling (AS), Pareto scaling, robust scaling, multiplicative scatter correction (MSC), normalization to sum (NS), normalization to constant vector length (NV), standard normal variate and asymmetric least squares (AsLS), were chosen on the sake of their availability in the R software and the rather simple computation steps. An ATR-FTIR spectral dataset of blue gel pen inks that is originated from 10 different manufacturers (i.e. brands) was used in this work. The dataset is colossal (N = 1361), high dimensional (J = 5401), multi-class (C = 10), and imbalanced. In order to examine the impacts of substrate interferences, the global spectral region was further divided, arbitrarily, into three mutually exclusive local regions and analyzed independently. Following that, the resulting four sub-datasets (i.e. one based on global and three based on local regions) were preprocessed via the DP methods independently to produce 40 different sub-datasets including the raw counterparts. Partial least squares-discriminant analysis (PLS-DA) was chosen to construct a series of 50 models by including the first 50 PLS components incrementally. The modeling was performed independently for each of the 40 sub-datasets. Each model was evaluated repeatedly using autoprediction, six variants of v-fold cross validation (v = 2, 4, 5, 7, 10, 15) and external testing schemes. As a results, empirical performances of each DP methods are represented by 400 different error rates (8 model validation schemes × 50 models). Performances of each DP method was then compared against its raw counterparts according to summary statistics and hypothesis tests. In addition, principal component analysis and hierarchical clustering analysis were also employed, respectively, to illustrate the spatial distribution and the similarity between the nine DP methods and the raw counterparts. Several important remarks have been drawn from the rigorous comparative analyses. First, due to the inherent properties of ATR-FTIR spectra, DP methods that handling slope, e.g. MSC and AsLS, have appeared to be the most excellent DP methods. Second, normalization methods, either NS or NV, ranked the second best-performing DP method. Third, MC shows no impact on the raw IR spectral dataset. Fourth, it is shown that outliers in the ATR-FTIR spectra of pen inks could be localized. Last but not least, removal of irrelevant signals arising from sample substrate is best achieved via region truncation rather than via PLS or DP methods alone.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Chemometrics and Intelligent Laboratory Systems - Volume 182, 15 November 2018, Pages 90-100
نویسندگان
, , ,