کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1248700 970456 2006 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی آنالیزی یا شیمی تجزیه
پیش نمایش صفحه اول مقاله
Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data
چکیده انگلیسی

This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number of variables is large. It describes the use of percentage correctly classified (%CC) as an indicator for success of a classification model. For small datasets, %CC should not be used uncritically and its interpretation depends on sample size. It illustrates the use of a common classification method, discriminant partial least squares (D-PLS) on a randomly generated dataset of 200 samples and 200 variables.An aim of the classifier is to determine whether the null hypothesis (there is no distinction between two classes) can be rejected. Autoprediction gives an 84.5% CC. It is shown that, if there is variable selection, it must be performed independently on the training set to obtain a CC close to 50% on the test set; otherwise, over-optimistic and false conclusions can be reached about the ability to classify samples into groups.Finally, two aims of determining the quality of a model are frequently confused, namely optimisation (often used to determine the most appropriate number of components in a model) and independent validation; to overcome this, the data should be split into three groups.There are often difficulties with model building if validation and optimisation have been done on different groups of samples, especially using iterative methods, each group being modelled using properties, such as a different number of components or different variables.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: TrAC Trends in Analytical Chemistry - Volume 25, Issue 11, December 2006, Pages 1103–1111
نویسندگان
,