Article ID Journal Published Year Pages File Type
431685 Journal of Discrete Algorithms 2008 13 Pages PDF
Abstract

Let s=s1..sns=s1..sn be a text (or sequence) on a finite alphabet Σ. A fingerprint in s   is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set FF of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint f∈Ff∈F is represented by a binary array, F  , of size |Σ||Σ| named a fingerprint table. A fingerprint, f∈Ff∈F, admits a number of maximal locations 〈i,j〉〈i,j〉 in S  , that is the alphabet of si..sjsi..sj is f   and si−1,sj+1si−1,sj+1, if defined, are not in f  . The set of maximal locations is L,|L|⩽n|Σ|. We present new algorithms and a new data structure for the three problems: (1) compute FF; (2) given F, answer if F   represents a fingerprint in FF; (3) given F, find all maximal locations of F in s  . These problems are, respectively, solved in O(n+|L|log|Σ|)O(n+|L|log|Σ|), Θ(|Σ|)Θ(|Σ|), and Θ(|Σ|+K)Θ(|Σ|+K) time—where K is the number of maximal locations of F.

Keywords
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,