کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
975256 1645118 2014 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A genome signature derived from the interplay of word frequencies and symbol correlations
ترجمه فارسی عنوان
امضای ژنوم مشتق شده از تعامل فرکانس ها و همبستگی های نماد
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات فیزیک ریاضی
چکیده انگلیسی


• Statistical properties of DNA sequences come from a wide range of biological processes.
• We derive a novel genome signature combining word frequencies and symbol correlations.
• Our genome signature performs better in a metagenomics clustering example.
• It reveals strong differences in eukaryotic microsatellite distribution.

Genome signatures are statistical properties of DNA sequences that provide information on the underlying species. It is not understood, how such species-discriminating statistical properties arise from processes of genome evolution and from functional properties of the DNA. Investigating the interplay of different genome signatures can contribute to this understanding. Here we analyze the statistical dependences of two such genome signatures: word frequencies and symbol correlations at short and intermediate distances.We formulate a statistical model of word frequencies in DNA sequences based on the observed symbol correlations and show that deviations of word counts from this correlation-based null model serve as a new genome signature. This signature (i) performs better in sorting DNA sequence segments according to their species origin and (ii) reveals unexpected species differences in the composition of microsatellites, an important class of repetitive DNA.While the first observation is a typical task in metagenomics projects and therefore an important benchmark for a genome signature, the latter suggests strong species differences in the biological mechanisms of genome evolution.On a more general level, our results highlight that the choice of null model (here: word abundances computed via symbol correlations rather than shorter word counts) substantially affects the interpretation of such statistical signals.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Physica A: Statistical Mechanics and its Applications - Volume 414, 15 November 2014, Pages 216–226
نویسندگان
, , ,