Article ID Journal Published Year Pages File Type
518685 Journal of Biomedical Informatics 2012 9 Pages PDF
Abstract

Genomics has contributed to a growing collection of gene–function and gene–disease annotations that can be exploited by informatics to study similarity between diseases. This can yield insight into disease etiology, reveal common pathophysiology and/or suggest treatment that can be appropriated from one disease to another. Estimating disease similarity solely on the basis of shared genes can be misleading as variable combinations of genes may be associated with similar diseases, especially for complex diseases. This deficiency can be potentially overcome by looking for common biological processes rather than only explicit gene matches between diseases. The use of semantic similarity between biological processes to estimate disease similarity could enhance the identification and characterization of disease similarity. We present functions to measure similarity between terms in an ontology, and between entities annotated with terms drawn from the ontology, based on both co-occurrence and information content. The similarity measure is shown to outperform other measures used to detect similarity. A manually curated dataset with known disease similarities was used as a benchmark to compare the estimation of disease similarity based on gene-based and Gene Ontology (GO) process-based comparisons. The detection of disease similarity based on semantic similarity between GO Processes (Recall = 55%, Precision = 60%) performed better than using exact matches between GO Processes (Recall = 29%, Precision = 58%) or gene overlap (Recall = 88% and Precision = 16%). The GO-Process based disease similarity scores on an external test set show statistically significant Pearson correlation (0.73) with numeric scores provided by medical residents. GO-Processes associated with similar diseases were found to be significantly regulated in gene expression microarray datasets of related diseases.

Graphical abstractDisease similarity is estimated by using an ontological metric to measure semantic similarity between Gene Ontology (GO) processes associated with diseases. The biological processes (GO processes) are determined based on the genes known to be involved in each disease. The similarity between diseases is then estimated using their underlying biological processes.Figure optionsDownload full-size imageDownload high-quality image (131 K)Download as PowerPoint slideHighlights► New metric to measure similarity between terms in an ontology. ► Application of similarity metric to quantify similarity between diseases. ► Similarity between diseases using GO-Processes. ► Curated dataset of similar diseases.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, ,