کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
519019 867633 2008 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Non-linear correlation of content and metadata information extracted from biomedical article datasets
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Non-linear correlation of content and metadata information extracted from biomedical article datasets
چکیده انگلیسی

Biomedical literature databases constitute valuable repositories of up to date scientific knowledge. The development of efficient machine learning methods in order to facilitate the organization of these databases and the extraction of novel biomedical knowledge is becoming increasingly important. Several of these methods require the representation of the documents as vectors of variables forming large multivariate datasets. Since the amount of information contained in different datasets is voluminous, an open issue is to combine information gained from various sources to a concise new dataset, which will efficiently represent the corpus of documents. This paper investigates the use of the multivariate statistical approach, called Non-Linear Canonical Correlation Analysis (NLCCA), for exploiting the correlation among the variables of different document representations and describing the documents with only one new dataset. Experiments with document datasets represented by text words, Medical Subject Headings (MeSH) and Gene Ontology (GO) terms showed the effectiveness of NLCCA.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Biomedical Informatics - Volume 41, Issue 1, February 2008, Pages 202–216
نویسندگان
, , ,