کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
9471007 1320054 2005 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
SVM classification of human intergenic and gene sequences
موضوعات مرتبط
علوم زیستی و بیوفناوری علوم کشاورزی و بیولوژیک علوم کشاورزی و بیولوژیک (عمومی)
پیش نمایش صفحه اول مقاله
SVM classification of human intergenic and gene sequences
چکیده انگلیسی
Despite constant improvement in prediction accuracy, gene-finding programs are still unable to provide automatic gene discovery with the desired correctness. This paper presents an analysis of gene and intergenic sequences from the point of view of language analysis, where gene and intergenic regions are regarded as two different subjects written in the four-letter alphabet {A, C, G, T}, and high frequency simple sequences are taken as keywords. A measurement α(l(τ)) was introduced to describe the relative repeat ratio of simple sequences. Threshold values were found for keyword selections. After eliminating 'noise', 178 short sequences were selected as keywords. DNA sequences are mapped to 178-dimensional Euclidean space, and SVM was used for prediction of gene regions. We showed by cross-validation that the program we developed could predict 93% of gene sequences with 7% false positives. When tested on a long genomic multi-gene sequence, our method improved nucleotide level specificity by 21%, and over 60% of predicted genes corresponded to actual genes.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Mathematical Biosciences - Volume 195, Issue 2, June 2005, Pages 168-178
نویسندگان
, , , , ,