Article ID Journal Published Year Pages File Type
433731 Theoretical Computer Science 2016 16 Pages PDF
Abstract

In this paper we define a new similarity measure: LCSk, aiming at finding the maximal number of k   length substrings matching in both input strings while preserving their order of appearance, for which the traditional LCS is a special case, where k=1k=1. We examine this generalization in both theory and practice. We first describe its basic solution and give an experimental evidence in real data for its ability to differentiate between sequences that are considered similar according to the LCS measure. We then examine extensions of the LCSk definition to LCS in at least k  -length substrings (LCS≥kLCS≥k) and 2-dimensional LCSk and also define complementary EDk and ED≥kED≥k distances.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , , ,