A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4942238	1437163	2017	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Tensor - تانسور Partial least squares - حداقل مربعات جزئی Computational biology - زیست‌شناسی محاسباتی Transcription factor binding sites - سایت های مرتبط با عامل رونویسی Classification - طبقه بندی Machine learning - یادگیری ماشین

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli

چکیده انگلیسی

- A novel tensor-based representation for TFBSs is proposed.
- Tensor-based representation captures more information than vector representation.
- Tensor-based representation captures interactions among physicochemical properties.
- Tensor-based representation alleviates the risk of over-fitting.

BackgroundIdentifying transcription factors binding sites (TFBSs) plays an important role in understanding gene regulatory processes. The underlying mechanism of the specific binding for transcription factors (TFs) is still poorly understood. Previous machine learning-based approaches to identifying TFBSs commonly map a known TFBS to a one-dimensional vector using its physicochemical properties. However, when the dimension-sample rate is large (i.e., number of dimensions/number of samples), concatenating different physicochemical properties to a one-dimensional vector not only is likely to lose some structural information, but also poses significant challenges to recognition methods.Materials and methodIn this paper, we introduce a purely geometric representation method, tensor (also called multidimensional array), to represent TFs using their physicochemical properties. Accompanying the multidimensional array representation, we also develop a tensor-based recognition method, tensor partial least squares classifier (abbreviated as TPLSC). Intuitively, multidimensional arrays enable borrowing more information than one-dimensional arrays. The performance of each method is evaluated by average F-measure on 51 Escherichia coli TFs from RegulonDB database.ResultsIn our first experiment, the results show that multiple nucleotide properties can obtain more power than dinucleotide properties. In the second experiment, the results demonstrate that our method can gain increased prediction power, roughly 33% improvements more than the best result from existing methods.ConclusionThe representation method for TFs is an important step in TFBSs recognition. We illustrate the benefits of this representation on real data application via a series of experiments. This method can gain further insights into the mechanism of TF binding and be of great use for metabolic engineering applications.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Artificial Intelligence in Medicine - Volume 75, January 2017, Pages 16-23

نویسندگان

Shiquan Sun, Xiongpan Zhang, Qinke Peng,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli

دسترسی سریع

ارتباط

English Website