Article ID Journal Published Year Pages File Type
5760132 Journal of Theoretical Biology 2017 25 Pages PDF
Abstract
We demonstrate an application of a core notion of information theory, typical sequences and their related properties, to analysis of population genetic data. Based on the asymptotic equipartition property (AEP) for nonstationary discrete-time sources producing independent symbols, we introduce the concepts of typical genotypes and population entropy and cross entropy rate. We analyze three perspectives on typical genotypes: a set perspective on the interplay of typical sets of genotypes from two populations, a geometric perspective on their structure in high dimensional space, and a statistical learning perspective on the prospects of constructing typical-set based classifiers. In particular, we show that such classifiers have a surprising resilience to noise originating from small population samples, and highlight the potential for further links between inference and communication.
Keywords
Related Topics
Life Sciences Agricultural and Biological Sciences Agricultural and Biological Sciences (General)
Authors
, , ,