Article ID Journal Published Year Pages File Type
438901 Theoretical Computer Science 2006 28 Pages PDF
Abstract

In the fields of data mining and knowledge discovery, many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and t has edge labels. In order to represent structural features common to such semistructured data, we propose a linear ordered term tree, which is a rooted tree pattern consisting of ordered tree structures and internal structured variables with distinct variable labels. For a set of edge labels Λ, let be the set of all linear ordered term trees. For a linear ordered term tree t in , the term tree language of t, denoted by LΛ(t), is the set of all ordered trees obtained from t by substituting arbitrary ordered trees for all variables in t. Given a set of ordered trees S, the minimal language problem for is to find a linear ordered term tree t in such that LΛ(t) is minimal among all term tree languages which contain all ordered trees in S. We show that the class is polynomial time inductively inferable from positive data, by giving a polynomial time algorithm for solving the minimal language problem for .

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics