کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4765308 1423858 2017 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilities
ترجمه فارسی عنوان
مجموعه ای از دانش مربوط به معادن مواد مخدر از بیننده توییتر: مدل های زبان و خدمات آنها
موضوعات مرتبط
مهندسی و علوم پایه مهندسی شیمی مهندسی شیمی (عمومی)
چکیده انگلیسی

In this data article, we present to the data science, natural language processing and public heath communities an unlabeled corpus and a set of language models. We collected the data from Twitter using drug names as keywords, including their common misspelled forms. Using this data, which is rich in drug-related chatter, we developed language models to aid the development of data mining tools and methods in this domain. We generated several models that capture (i) distributed word representations and (ii) probabilities of n-gram sequences. The data set we are releasing consists of 267,215 Twitter posts made during the four-month period-November, 2014 to February, 2015. The posts mention over 250 drug-related keywords. The language models encapsulate semantic and sequential properties of the texts.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data in Brief - Volume 10, February 2017, Pages 122-131
نویسندگان
, ,