کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1118872 1488464 2013 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Design and Annotation of MultiMedica – A Multilingual Text Corpus of the Biomedical Domain
موضوعات مرتبط
علوم انسانی و اجتماعی علوم انسانی و هنر هنر و علوم انسانی (عمومی)
پیش نمایش صفحه اول مقاله
Design and Annotation of MultiMedica – A Multilingual Text Corpus of the Biomedical Domain
چکیده انگلیسی

This article describes the MultiMedica corpus, a multilingual collection of Spanish, Japanese, and Arabic texts from the biomedical domain. This novel combination of languages has been chosen with two purposes: the contrastive study of three languages that are typologically and genetically different, and the creation of a gold standard to develop and evaluate an Automatic Term Recognition (ATR) system. A total of 51,476 documents have been collected from the Web, and the corpus contains over seven and a half million words. Most documents were written by medical doctors and edited by journalists for the general public. Each text has been tagged for Part-of-Speech and indexed in an Information Retrieval system and a concordance interface that is aimed at students of Translation, Medicine, and Medical Humanities.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia - Social and Behavioral Sciences - Volume 95, 25 October 2013, Pages 33-39