Article ID Journal Published Year Pages File Type
6370079 Journal of Theoretical Biology 2015 8 Pages PDF
Abstract
Circular codes are putative remnants of primeval comma-free codes and are potentially involved in detecting and maintaining the normal reading frame in protein coding sequences. In Michel and Pirillo (2013a) it was shown by computer algorithm that no maximal trinucleotide circular code can encode more than 18 different amino acids under the standard version of the genetic code. For comma-free codes the maximum is even less, namely 13 (Michel, 2014). The main purpose of this paper is to investigate these facts from a mathematical point of view and to show why the codes with the best-known error detecting properties are limited in the number of amino acids they can encode. We introduce five hierarchically ordered classes of trinucleotide codes including the well-known comma-free and circular codes and prove combinatorically that it is impossible to encode all amino acids using codes from four out of the five classes that have the strongest error detecting properties. However, it is possible to encode all 20 amino acids using codes from the largest class with the weakest properties. Additionally, we develop a handy criterion for circularity. As an application, it is shown that all codes from a special class of trinucleotide codes which includes the RNY-primeval code (Shepherd, 1986) are automatically circular. We also list which amino acids these codes encode.
Related Topics
Life Sciences Agricultural and Biological Sciences Agricultural and Biological Sciences (General)
Authors
, ,