کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1151213 958201 2006 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات آمار و احتمال
پیش نمایش صفحه اول مقاله
Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples
چکیده انگلیسی

This is a comparative study of various clustering and classification algorithms as applied to differentiate cancer and non-cancer protein samples using mass spectrometry data. Our study demonstrates the usefulness of a feature selection step prior to applying a machine learning tool. A natural and common choice of a feature selection tool is the collection of marginal pp-values obtained from tt-tests for testing the intensity differences at each m/zm/z ratio in the cancer versus non-cancer samples. We study the effect of selecting a cutoff in terms of the overall Type 1 error rate control on the performance of the clustering and classification algorithms using the significant features. For the classification problem, we also considered m/zm/z selection using the importance measures computed by the Random Forest algorithm of Breiman. Using a data set of proteomic analysis of serum from ovarian cancer patients and serum from cancer-free individuals in the Food and Drug Administration and National Cancer Institute Clinical Proteomics Database, we undertake a comparative study of the net effect of the machine learning algorithm–feature selection tool–cutoff criteria combination on the performance as measured by an appropriate error rate measure.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Statistical Methodology - Volume 3, Issue 1, January 2006, Pages 79–92
نویسندگان
, ,