کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
437497 690149 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A really simple approximation of smallest grammar
ترجمه فارسی عنوان
تقریب ساده ای از کوچکترین دستور زبان
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی

In this paper we present a really   simple linear-time algorithm constructing a context-free grammar of size 4glog3/2⁡(N/g)4glog3/2⁡(N/g) for the input string, where N is the size of the input string and g   the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Σ of the input string can be identified with numbers from {1,…,Nc}{1,…,Nc} for some constant c. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses were involved. The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most ℓ   factors as well as additional O(ℓ)O(ℓ) letters, where ℓ   was the size of the original LZ77 factorisation. In one phase in a greedy way (by a left-to-right sweep and a help of the factorisation) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are O(ℓ)O(ℓ) many different pairs among them. Hence there are O(log⁡N)O(log⁡N) phases, each of them introduces O(ℓ)O(ℓ) nonterminals to a grammar. A more precise analysis yields a bound ℓ+4ℓlog⁡(N/ℓ)ℓ+4ℓlog⁡(N/ℓ). As ℓ≤gℓ≤g, this yields the desired bound g+4glog⁡(N/g)g+4glog⁡(N/g).

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 616, 22 February 2016, Pages 141–150
نویسندگان
,