|کد مقاله||کد نشریه||سال انتشار||مقاله انگلیسی||ترجمه فارسی||نسخه تمام متن|
|570565||876839||2017||18 صفحه PDF||ندارد||دانلود کنید|
• A novel mutual information estimator for both discrete and continuous variables is proposed.
• The estimator takes a positive value for large n if and only if the two variables are independent.
• The estimation completes within a quasi-polynomial time of sample size.
• This paper demonstrates that it finds relations among the gene expressions and SNP values.
This paper proposes an estimator of mutual information for both discrete and continuous variables and applies it to the Chow–Liu algorithm to find a forest that expresses probabilistic relations among them. The state-of-the-art assumes that the continuous variables are Gaussian and that the graphical model under discrete and continuous variables is ANOVA. Consequently, it is difficult to obtain the maximum likelihood of three connected variables such that the center is Gaussian and the other two are discrete, and thus, the state-of-the-art restricts the class to the forest such that there is no Gaussian node between discrete variables. The proposed method executes in a general setting without any assumptions, preparing several meshes, computing the mutual information values, and selecting the maximum value. We prove that the number of meshes to be prepared is at most O(log2n)O(log2n) and that the estimated mutual information is no larger than zero if and only if the variables are independent for large n. Finally, we apply the proposed method to the problems of gene differential analysis and relation discovery between gene expression and SNPs (single nucleotide polymorphisms). In particular, for the latter experiment, we demonstrate that the proposed method successfully captures the relation among them but that the state-of-the-art fails because of the merits and demerits of the proposed and existing methods.
Journal: International Journal of Approximate Reasoning - Volume 80, January 2017, Pages 1–18