Relax with CouchDB — Into the non-relational DBMS era of bioinformatics

Article ID	Journal	Published Year	Pages	File Type
2820975	Genomics	2012	7 Pages	PDF

Abstract

With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug–target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services.

► We present the first three applications of CouchDB to bioinformatics. ► geneSmash is a new web service integrating gene annotations and genomic location. ► drugBase supports gene-based batch queries to get drug–target interactions. ► HapMap-CN supports gene-based queries of copy number analyses. ► Applications use standard internet protocols, usable in all programming languages.

Keywords

copy number variation Drug?target interaction Data integration