کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6939858 869876 2016 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Clustering of cell populations in flow cytometry data using a combination of Gaussian mixtures
ترجمه فارسی عنوان
خوشه بندی جمعیت های سلولی در داده های فلوسایتومتری با استفاده از ترکیب مخلوط های گاوسی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی
We propose a supervised learning approach to automatic quantification of cell populations in flow cytometric samples. One sample contains up to millions of measurement vectors with a dimensionality between 10 and 20. Normally, each measurement vector corresponds to a single cell in the biological sample. Identifying biologically meaningful cell populations is essentially a clustering problem, however, standard clustering methods are impractical, because size, shape and location of corresponding clusters may vary strongly between samples mainly due to phenotypic differences and inter-laboratory variations. In our holistic approach, we implicitly employ the structural information (such as relative locations and shape of sub-populations). A new input sample is reconstructed by a linear combination of artificial reference samples each represented by a Gaussian Mixture Model (GMM), in which for each Gaussian component the class label of the corresponding cluster of observations is known. The reference samples are calculated from a larger set of training samples by non-negative matrix factorization and can be regarded as the basis of a lower dimensional feature space, in which input samples are reconstructed. We show a method for calculating the feature space transformation based on minimization the L2 distance defined between two GMM. The feature space representation of the sample is then used to assign each observation to one of the specified sub-populations by a Bayes decision. We present classification results on a database of about 170 patients with Acute Lymphoblastic Leukemia (ALL), where high accuracy in the prediction of relatively small leukemic populations is crucial. The approach is not limited to our application. It can be employed wherever analysis of large, multi-dimensional, numerical data of a specific class of samples with related structure has to be performed.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 60, December 2016, Pages 1029-1040
نویسندگان
, , , , , ,