Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4496953 | Journal of Theoretical Biology | 2011 | 11 Pages |
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2⁎ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2D2, D2⁎ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model.
► Developed new powerful alignment-free statistics to test if two sequences are related through a pattern transfer model. ► The power of the statistics increases to 1 as sequence length tends to infinity. ► Developed new similarity measures between two sequences to measure their similarity. ► The similarity measures are highly associated with sequence similarity score based on alignment.