کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
2818136 1160033 2012 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The distribution of GC nucleotides and regulatory sequence motifs in genes and their adjacent sequences
موضوعات مرتبط
علوم زیستی و بیوفناوری بیوشیمی، ژنتیک و زیست شناسی مولکولی ژنتیک
پیش نمایش صفحه اول مقاله
The distribution of GC nucleotides and regulatory sequence motifs in genes and their adjacent sequences
چکیده انگلیسی

The genomes of warm-blooded vertebrates are a mosaic of alternating fragments, isochores, with low and high GC contents and embedded genes. The evolutionary mechanisms leading to such structures are not fully understood. We have compared the distributions of GC base pairs in coding sequences and sequences spanning 5 kb upstream and downstream of genes in human and other species annotated in the RefSeq database and in different isochores of the human genome. Using our computer application NucleoSeq (available at www.bioinformatics.aei.polsl.pl), we also compared the average distributions of AT-rich regulatory motifs and transcription factor binding sites (TFBS) for single transcription factors with those in randomized sequences of the human genome, and revealed that some TFBS have a lower average frequency in a gene's promoter than in the randomized sequence, whereas for other transcription factors the opposite is observed. TFBS for some transcription factors show a higher frequency in the coding sequence than in the regulatory and in randomized sequences, suggesting their accumulation during evolution and possible functional roles. On the basis of the GC content in genes and their adjacent sequences which was similar in all species studied here, and the distribution of regulatory motifs, we hypothesize that the first step in evolution of many genes existing today was the joining of a GC-rich coding sequence to a region with a lower GC content and the potential to create regulatory motifs.


► We quantitated regulatory motifs inside coding (CDS) and adjacent sequences.
► Transcription factor binding sites and AU rich elements (AREs) were analyzed.
► Some transcription factor binding sites are most frequent in coding sequences.
► ARE motifs are significantly more abundant up- and down-stream than within CDS.
► They are also more frequent in real than in randomized sequences except for CDS.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Gene - Volume 492, Issue 2, 25 January 2012, Pages 375–381
نویسندگان
, ,