کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
429068 | 687030 | 2010 | 4 صفحه PDF | دانلود رایگان |

The problem of pattern matching with wildcards is to find all the occurrences of a pattern of length m in a text of length n over a finite alphabet Σ (both the text and the pattern are allowed to contain wildcards). Based on the prime number encoding scheme (Chaim Linhart, Ron Shamir, Faster pattern matching with character classes using prime number encoding, J. Comput. Syst. Sci. 75 (3) (2009) 155–162), we present a new integer encoding and an efficient fast Fourier transforms based algorithm for this problem. The algorithm takes O(nlogm) time to search the pattern in the text by computing one convolution. For matching with wildcards, our encoding uses fewer prime numbers and has shorter code words comparing with the prime number encoding. We use at most 2lg|Σ|2lg|Σ| prime numbers to encode the symbols while in the prime number encoding |Σ||Σ| prime numbers are required. This number reduces to 1.5lg|Σ|1.5lg|Σ| when |Σ|>40|Σ|>40. The code word used in the algorithm is at most 2⌊lg|Σ|⌋⌈lg(5m)⌉2⌊lg|Σ|⌋⌈lg(5m)⌉ bits while in the prime encoding it is at least |Σ|lgm bits. We also show that the length of words can be further reduced by increasing the number of convolutions computed.
Research highlights
► Efficient pattern matching with wildcards by one convolution using words with short length.
► An integer number encoding using a small number of prime numbers.
► Reducing the word size by computing more than one convolution.
Journal: Information Processing Letters - Volume 110, Issue 24, 30 November 2010, Pages 1099–1102