کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
430014 687781 2014 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
String analysis by sliding positioning strategy
ترجمه فارسی عنوان
تجزیه و تحلیل رشته با کشیدن موقعیت استراتژی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• We define new data structures and algorithms for string mining.
• We analyze small databases of long strings instead a vast database of small strings.
• The new structures let us process strings including Short Tandem Repeats.
• The proposed algorithm can calculate different interestingness measures efficiently.

Discovering frequent factors from long strings is an important problem in many applications, such as biosequence mining. In classical approaches, the algorithms process a vast database of small strings. However, in this paper we analyze a small database of long strings. The main difference resides in the high number of patterns to analyze. To tackle the problem, we have developed a new algorithm for discovering frequent factors in long strings. We present an Apriori-like solution which exploits the fact that any super-pattern of a non-frequent pattern cannot be frequent. The SANSPOS algorithm does a multiple-pass, candidate generation and test approach. Multiple length patterns can be generated in a pass. This algorithm uses a new data structure to arrange nodes in a trie. A Positioning Matrix is defined as a new positioning strategy. By using Positioning Matrices, we can apply advanced prune heuristics in a trie with a minimal computational cost. The Positioning Matrices let us process strings including Short Tandem Repeats and calculate different interestingness measures efficiently. Furthermore, in our algorithm we apply parallelism to transverse different sections of the input strings concurrently, speeding up the resulting running time. The algorithm has been successfully used in natural language and biological sequence contexts.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Computer and System Sciences - Volume 80, Issue 1, February 2014, Pages 3-15