The inclusion problem for regular expressions

Article ID	Journal	Published Year	Pages	File Type
429600	Journal of Computer and System Sciences	2012	19 Pages	PDF

Abstract

This paper presents a polynomial-time algorithm for the inclusion problem for a large class of regular expressions. The algorithm is not based on construction of finite automata, and can therefore be faster than the lower bound implied by the Myhill–Nerode theorem. The algorithm automatically discards irrelevant parts of the right-hand expression. The irrelevant parts of the right-hand expression might even be 1-ambiguous. For example, if r is a regular expression such that any DFA recognizing r is very large, the algorithm can still, in time independent of r, decide that the language of ab is included in that of (a+r)b(a+r)b. The algorithm is based on a syntax-directed inference system. It takes arbitrary regular expressions as input. If the 1-ambiguity of the right-hand expression becomes a problem, the algorithm will report this. Otherwise, it will decide the inclusion problem for the input.

► We present a polynomial algorithm for the inclusion problem for regular expressions. ► If the right-hand expression is 1-unambiguous, the algorithm gives the correct answer. ► Otherwise, it may give the correct answer, or no answer. ► The algorithm automatically discards unneeded parts of the right-hand expression. ► It may be faster than the lower bounds given by the Myhill–Nerode theorem.

Keywords

Formal languages Regular expressions Inclusion