Identification of tandem repeats over large-alphabet inputs

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
392630	665145	2016	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Tandem repeats - Tandem تکرار می شود Primitive - اولیه Bioinformatics - بیوانفورماتیک Sequence analysis - تجزیه و تحلیل توالی Maximal - حداکثر

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Identification of tandem repeats over large-alphabet inputs

چکیده انگلیسی

• Makes use of simple list structure instead of suffix trees.
• Algorithm finds all the primitive maximal tandem repeats of all sizes.
• Theoretically the algorithm is linear in space.
• In practice the algorithm is linear in time and independent of the size of alphabet.
• Application in stringology and bioinformatics.

A primitive tandem repeat α is a substring in string S if it can be expressed as two or more contiguous copies of β, where the base β cannot be expressed in terms of yet a shorter substring. Substring α is maximal if there is no copy of β to either its left or right. Tandem repeats (or arrays) are known to play an important role in biology, e.g. determining parentage. We present a deterministic algorithm that finds all the exact primitive maximal tandem arrays that occur in S, where at iteration k it discovers all the repeats with base length k. Our algorithm uses a simple list structure for all its operations. In theory, the algorithm scales well with the size of the alphabet. For strings over the alphabet Σ it has a complexity of O(|S|+|Σ|)O(|S|+|Σ|) in space, and O(B|S|−B2/2)O(B|S|−B2/2) in time, where B is the length of the longest primitive base of a tandem repeat in S. Experimental results on real biological sequences, randomly generated sequences using large sized alphabets, and Fibonacci strings, show that in practice the algorithm has indeed a linear complexity, both in space and time.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 345, 1 June 2016, Pages 96–105

نویسندگان

Aqil M. Azmi,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Identification of tandem repeats over large-alphabet inputs

دسترسی سریع

ارتباط

English Website