AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6902125	1446498	2017	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

word2vec NLP Word embeddings - تعبیر کلمه Arabic - عربی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP

چکیده انگلیسی

Advancements in neural networks have led to developments in fields like computer vision, speech recognition and natural language processing (NLP). One of the most influential recent developments in NLP is the use of word embeddings, where words are represented as vectors in a continuous space, capturing many syntactic and semantic relations among them. AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models. The first version of AraVec provides six different word embedding models built on top of three different Arabic content domains; Tweets, World Wide Web pages and Wikipedia Arabic articles. The total number of tokens used to build the models amounts to more than 3,300,000,000. This paper describes the resources used for building the models, the employed data cleaning techniques, the carried out preprocessing step, as well as the details of the employed word embedding creation techniques.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 117, 2017, Pages 256-265

نویسندگان

Abu Bakr Soliman, Kareem Eissa, Samhaa R. El-Beltagy,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP

دسترسی سریع

ارتباط

English Website