A new technique for generating pathogenic barcodes in breast cancer susceptibility analysis

Article ID	Journal	Published Year	Pages	File Type
4496072	Journal of Theoretical Biology	2015	7 Pages	PDF

Abstract

•A criteria of maximum dissimilarity–minimum entropy is proposed for identifying pathogenic barcodes.•Low entropy indicates a relatively consistent pattern to cause disease in case samples.•Large dissimilarity indicates significant distinction between cases and controls.•Large dissimilarity pathogenic barcodes with consistent pattern in cases are risky.•From the perspective of statistics, if a shorter barcode contributes to complex diseases, the complex diseases may be more common in population.

Complex diseases usually involve complex interactions between multiple loci. The artificial intelligent algorithm is a plausible strategy to evade combinatorial explosion. However, the randomness of solution of this algorithm loses decreases the confidence of biological researchers on this algorithm. Meanwhile, the lack of an efficient and effective measure to profile the distribution of cases and controls impedes the discovery of pathogenic epistasis. Here we present an efficient method called maximum dissimilarity–minimum entropy (MDME) to analyze breast cancer single-nucleotide polymorphism (SNP) data. The method searches risky barcodes, which to increase the odds ratio and relative risk of the breast cancer. This method based on the hypothesis that if a specific barcode is associated with a disease, then the barcode permits distinction of cases from controls and more importantly it shows a relative consistent pattern in cases. An analysis based on simulated dataset explains the necessity of minimum entropy. Experimental results show that our method can find the most risky barcode that contributes to breast cancer susceptibility. Our method may also mine several pathogenic barcodes that condition the different subtypes of cancer.

Keywords

Entropy Epistasis Breast cancer odds ratio Single-nucleotide polymorphism