کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942238 1437163 2017 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli
چکیده انگلیسی


- A novel tensor-based representation for TFBSs is proposed.
- Tensor-based representation captures more information than vector representation.
- Tensor-based representation captures interactions among physicochemical properties.
- Tensor-based representation alleviates the risk of over-fitting.

BackgroundIdentifying transcription factors binding sites (TFBSs) plays an important role in understanding gene regulatory processes. The underlying mechanism of the specific binding for transcription factors (TFs) is still poorly understood. Previous machine learning-based approaches to identifying TFBSs commonly map a known TFBS to a one-dimensional vector using its physicochemical properties. However, when the dimension-sample rate is large (i.e., number of dimensions/number of samples), concatenating different physicochemical properties to a one-dimensional vector not only is likely to lose some structural information, but also poses significant challenges to recognition methods.Materials and methodIn this paper, we introduce a purely geometric representation method, tensor (also called multidimensional array), to represent TFs using their physicochemical properties. Accompanying the multidimensional array representation, we also develop a tensor-based recognition method, tensor partial least squares classifier (abbreviated as TPLSC). Intuitively, multidimensional arrays enable borrowing more information than one-dimensional arrays. The performance of each method is evaluated by average F-measure on 51 Escherichia coli TFs from RegulonDB database.ResultsIn our first experiment, the results show that multiple nucleotide properties can obtain more power than dinucleotide properties. In the second experiment, the results demonstrate that our method can gain increased prediction power, roughly 33% improvements more than the best result from existing methods.ConclusionThe representation method for TFs is an important step in TFBSs recognition. We illustrate the benefits of this representation on real data application via a series of experiments. This method can gain further insights into the mechanism of TF binding and be of great use for metabolic engineering applications.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Artificial Intelligence in Medicine - Volume 75, January 2017, Pages 16-23
نویسندگان
, , ,