Article ID Journal Published Year Pages File Type
4508297 Current Opinion in Insect Science 2015 7 Pages PDF
Abstract

•RNA-Seq is useful for gene prediction but surprisingly many errors remain.•Statistical models, genome and protein data help to identify protein coding genes.•Clade genome sequencing projects require the development of new methods.•Try tools that have been shown to work well instead of what ‘everybody’ uses.

We review software tools for gene prediction — the identification of protein-coding genes and their structure in genome sequences. The discussed approaches include methods based on RNA-Seq and current methods based on homology — comparative gene prediction and protein spliced alignments. Many methods require that their parameters are adjusted to the target species or its broader clade. These include ab initio gene finders, integrated approaches with ab initio components and some aligners. We also review current automatic methods for training for the common case that a bona fide training set of gene structures is not available before annotation.

Related Topics
Life Sciences Agricultural and Biological Sciences Agronomy and Crop Science
Authors
, ,