کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
531284 869825 2011 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A sum-over-paths extension of edit distances accounting for all sequence alignments
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
A sum-over-paths extension of edit distances accounting for all sequence alignments
چکیده انگلیسی

This paper introduces a simple Sum-over-Paths (SoP) formulation of string edit distances accounting for all possible alignments between two sequences, and extends related previous work from bioinformatics to the case of graphs with cycles. Each alignment ℘℘, with a total cost C(℘)C(℘), is assigned a probability of occurrence P(℘)=exp[−θC(℘)]/ZP(℘)=exp[−θC(℘)]/Z where ZZ is a normalization factor. Therefore, good alignments (having a low cost) are favored over bad alignments (having a high cost). The expected cost ∑℘∈PC(℘)exp[−θC(℘)]/Z∑℘∈PC(℘)exp[−θC(℘)]/Z computed over all possible alignments ℘∈P℘∈P defines the SoP edit distance. When θ→∞θ→∞, only the best alignments matter and the measure reduces to the standard edit distance. The rationale behind this definition is the following: for some applications, two sequences sharing many good alignments should be considered as more similar than two sequences having only one single good, optimal, alignment in common. In other words, sub-optimal alignments could also be taken into account. Forward/backward recurrences allowing to efficiently compute the expected cost are developed. Virtually any Viterbi-like sequence comparison algorithm computed on a lattice can be generalized in the same way; for instance, a SoP longest common subsequence is also developed. Pattern classification tasks performed on five data sets show that the new measures usually outperform the standard ones and, in any case, never perform significantly worse, at the expense of tuning the parameter θθ.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 44, Issue 6, June 2011, Pages 1172–1182
نویسندگان
, , , ,