Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
5760132 | Journal of Theoretical Biology | 2017 | 25 Pages |
Abstract
We demonstrate an application of a core notion of information theory, typical sequences and their related properties, to analysis of population genetic data. Based on the asymptotic equipartition property (AEP) for nonstationary discrete-time sources producing independent symbols, we introduce the concepts of typical genotypes and population entropy and cross entropy rate. We analyze three perspectives on typical genotypes: a set perspective on the interplay of typical sets of genotypes from two populations, a geometric perspective on their structure in high dimensional space, and a statistical learning perspective on the prospects of constructing typical-set based classifiers. In particular, we show that such classifiers have a surprising resilience to noise originating from small population samples, and highlight the potential for further links between inference and communication.
Keywords
Related Topics
Life Sciences
Agricultural and Biological Sciences
Agricultural and Biological Sciences (General)
Authors
Omri Tal, Tat Dat Tran, Jacobus Portegies,