Uncertainty estimation with a finite dataset in the assessment of classification models

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
415012	681158	2012	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Microarray classification Uncertainty - عدم قطعیت

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Uncertainty estimation with a finite dataset in the assessment of classification models

چکیده انگلیسی

To successfully translate genomic classifiers to the clinical practice, it is essential to obtain reliable and reproducible measurement of the classifier performance. A point estimate of the classifier performance has to be accompanied with a measure of its uncertainty. In general, this uncertainty arises from both the finite size of the training set and the finite size of the testing set. The training variability is a measure of classifier stability and is particularly important when the training sample size is small. Methods have been developed for estimating such variability for the performance metric AUC (area under the ROC curve) under two paradigms: a smoothed cross-validation paradigm and an independent validation paradigm. The methodology is demonstrated on three clinical microarray datasets in the microarray quality control consortium phase two project (MAQC-II): breast cancer, multiple myeloma, and neuroblastoma. The results show that the classifier performance is associated with large variability and the estimated performance may change dramatically on different datasets. Moreover, the training variability is found to be of the same order as the testing variability for the datasets and models considered. In conclusion, the feasibility of quantifying both training and testing variability of classifier performance is demonstrated on finite real-world datasets. The large variability of the performance estimates shows that patient sample size is still the bottleneck of the microarray problem and the training variability is not negligible.

► We quantify classifier variability arising from both training and testing.
► Microarray classifier accuracy has large variability and is dataset-dependent.
► Patient sample size is still the bottleneck of the microarray problem.
► Training variability is not negligible.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Statistics & Data Analysis - Volume 56, Issue 5, 1 May 2012, Pages 1016–1027

نویسندگان

Weijie Chen, Waleed A. Yousef, Brandon D. Gallas, Elizabeth R. Hsu, Samir Lababidi, Rong Tang, Gene A. Pennello, W. Fraser Symmans, Lajos Pusztai,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Uncertainty estimation with a finite dataset in the assessment of classification models

دسترسی سریع

ارتباط

English Website