Article ID Journal Published Year Pages File Type
518682 Journal of Biomedical Informatics 2012 14 Pages PDF
Abstract

The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.

Graphical abstractFigure optionsDownload full-size imageDownload high-quality image (64 K)Download as PowerPoint slideHighlights► kDLS provides a comprehensive solution to index the UMLS. ► Reachability, distance and path queries on the UMLS can be efficiently answered. ► kDLS enables Large scale knowledge discovery on the UMLS. ► Genes can be effectively prioritized with respect to diseases. ► New hypotheses on gene–disease relations can be generated.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , , , , ,