کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
533338 | 870105 | 2013 | 13 صفحه PDF | دانلود رایگان |
Biclustering is an important tool to find patterns in a microarray data matrix by simultaneous classification in two dimensions of genes and conditions. Unlike most existed biclustering algorithms where almost all genes and conditions are involved in the clustering process even if they contribute little to a bicluster, we propose to perform the biclustering operation only in related genes and conditions of a given bicluster type. In our algorithm, the gene expression matrix is first partitioned to stable and unstable submatrices in both row and column directions by inspecting the similarity between the row (or column) vector and the full 1s vector, then the related genes and conditions of a given type of biclusters are extracted by inspecting the row or column pairs in the corresponding stable or unstable submatrices, finally the resulted biclusters of any type are obtained by performing clustering analysis in the extracted related genes and conditions. Additionally, a novel strategy for estimating the missing data in the gene expression matrix is also presented based on the James–Stein and kernel estimation principle where the estimation matrix is obtained with the k means algorithm. Experimental results show excellent performance of our algorithm both in missing data estimation and biclustering.
► A novel strategy for estimating missing data is proposed.
► Estimating missing data based on James–Stein and kernel estimation principles.
► The gene expression matrix is partitioned to stable and unstable matrices.
► The strategy eliminates the disturbance of the irrelative genes or conditions.
Journal: Pattern Recognition - Volume 46, Issue 4, April 2013, Pages 1170–1182