Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
518682 | Journal of Biomedical Informatics | 2012 | 14 Pages |
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.
Graphical abstractFigure optionsDownload full-size imageDownload high-quality image (64 K)Download as PowerPoint slideHighlights► kDLS provides a comprehensive solution to index the UMLS. ► Reachability, distance and path queries on the UMLS can be efficiently answered. ► kDLS enables Large scale knowledge discovery on the UMLS. ► Genes can be effectively prioritized with respect to diseases. ► New hypotheses on gene–disease relations can be generated.