Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
433731 | Theoretical Computer Science | 2016 | 16 Pages |
Abstract
In this paper we define a new similarity measure: LCSk, aiming at finding the maximal number of k length substrings matching in both input strings while preserving their order of appearance, for which the traditional LCS is a special case, where k=1k=1. We examine this generalization in both theory and practice. We first describe its basic solution and give an experimental evidence in real data for its ability to differentiate between sequences that are considered similar according to the LCS measure. We then examine extensions of the LCSk definition to LCS in at least k -length substrings (LCS≥kLCS≥k) and 2-dimensional LCSk and also define complementary EDk and ED≥kED≥k distances.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
G. Benson, A. Levy, S. Maimoni, D. Noifeld, B.R. Shalom,