Article ID Journal Published Year Pages File Type
38307 World Patent Information 2008 9 Pages PDF
Abstract

Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group conducted a study to critically compare and evaluate patent sequence databases for data content. A series of sequences were searched to find similar sequences from several well known sources: GENESEQ™, CAS REGISTRY/CAplusSM, PCTGEN, NCBI GenBank®, EMBL-Bank and the EBI Fasta databases. The study highlights some differences between GENESEQ™ and REGISTRY/CAplusSM results within the context of indexing policy and patent coverage. In comparison to the proprietary databases, the authors have identified important deficiencies in the content of the public databanks. This paper also discusses database timeliness and the choice of algorithm as potential reasons for missing data.

Related Topics
Physical Sciences and Engineering Chemical Engineering Bioengineering
Authors
, , , , , , ,