Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
431633 | Journal of Discrete Algorithms | 2015 | 6 Pages |
Abstract
We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears least often. This has potential applications in data mining, bioinformatics, security, and big data. We show that adapting the classical linear-space solutions for this problem is trivial, but the compressed-space solutions are not easy to extend. We design a new solution for this problem that matches the best-known result when using 2|CSA|+o(n)2|CSA|+o(n) bits, where CSACSA is a Compressed Suffix Array. Our structure answers queries in the time needed by the CSACSA to find the suffix array interval of the pattern plus O(klgklgϵn)O(klgklgϵn) accesses to suffix array cells, for any constant ϵ>0ϵ>0.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Gonzalo Navarro, Sharma V. Thankachan,