دانلود رایگان مقاله: طبقه بندی متنی متقابل زبان مبتنی بر ویکیپدیا

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4944430	1437990	2017	17 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Wikipedia-based cross-language text classification

ترجمه فارسی عنوان

طبقه بندی متنی متقابل زبان مبتنی بر ویکیپدیا

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

طبقه بندی متنی متقابل، ویکیپدیا معدنچی کیسه ای از مفاهیم، کیسه ای از کلمات، ترکیبی، نمایندگی سند،

Hybrid - ترکیبی Document representation - نمایندگی سند Bag of words - کی از کلمات

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

طبقه بندی متنی متقابل زبان مبتنی بر ویکیپدیا

چکیده انگلیسی

This paper presents the application of a Wikipedia-based bag of concepts (WikiBoC) document representation to cross-language text classification (CLTC). Its main objective is to alleviate the major drawbacks of the state-of-the-art CLTC approaches - typically based on the machine translation (MT) of documents, which are represented as bags of words (BoW). We propose a technique called cross-language concept matching (CLCM), to convert concept-based representations of documents from one language to another using Wikipedia correspondences between concepts in different languages and thus not relying on automated full-text translations. We describe two proposals: the first proposal consists in the use of the WikiBoC representation in conjunction with the CLCM technique (WikiBoC-CLCM) to classify documents written in a language L1 by using a SVM algorithm that was trained with documents written in another language L2; the second proposal consists of a hybrid model for representing documents that combines WikiBoC-CLCM with the classic BoW-MT approach. To evaluate the two proposals we conducted several experiments with three cross-lingual corpora: the JRC-Acquis corpus and two purpose-built corpora composed of Wikipedia articles. The first proposal outperforms state-of-the-art approaches when training sequences are short, achieving performance increases up to 233.33%. The second proposal outperforms state-of-the-art approaches in the whole range of training sequences, achieving performance increases up to 23.78%. Results obtained show the benefits of the WikiBoC-CLCM approach, since concepts extracted from documents add useful information to the classifier, thus improving its performance.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 406â407, September 2017, Pages 12-28

نویسندگان

Marcos Antonio Mouriño GarcÃa, Roberto Pérez RodrÃguez, Luis Anido Rifón,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : طبقه بندی متنی متقابل زبان مبتنی بر ویکیپدیا

دسترسی سریع

ارتباط

English Website