دانلود رایگان مقاله: پیش بینی ابرداده های زیست پزشکی در CEDAR: مطالعه Omnibus Expression ژن (GEO)

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4966784	1449297	2017	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO)

ترجمه فارسی عنوان

پیش بینی ابرداده های زیست پزشکی در CEDAR: مطالعه Omnibus Expression ژن (GEO)

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

داده کاوی؛ پیش بینی؛ متاداده؛ GEO؛ CEDAR

GEO Data mining - داده‌کاوی Cedar - سدر Metadata - متاداده Prediction - پیش بینی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش مقاله

پیش بینی ابرداده های زیست پزشکی در CEDAR: مطالعه Omnibus Expression ژن (GEO)

چکیده انگلیسی

- Associations between metadata elements exist and can be predicted.
- Two algorithms perform better than frequency based predictions.
- Our predictive approach has potential for metadata authoring.

A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3Â million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table.All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse.

320

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Biomedical Informatics - Volume 72, August 2017, Pages 132-139

نویسندگان

Maryam Panahiazar, Michel Dumontier, Olivier Gevaert,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : پیش بینی ابرداده های زیست پزشکی در CEDAR: مطالعه Omnibus Expression ژن (GEO)

دسترسی سریع

ارتباط

English Website