کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4627409 1631814 2014 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A keyword extraction method from twitter messages represented as graphs
ترجمه فارسی عنوان
یک روش استخراج کلمه کلیدی از پیام های توییتر به عنوان نمودار ارائه شده است
کلمات کلیدی
کشف دانش، استخراج متن، استخراج کلید واژه، نظریه گراف، اقدامات مرکزی، اطلاعات توییتر
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات ریاضیات کاربردی
چکیده انگلیسی

Twitter is a microblog service that generates a huge amount of textual content daily. All this content needs to be explored by means of text mining, natural language processing, information retrieval, and other techniques. In this context, automatic keyword extraction is a task of great usefulness. A fundamental step in text mining techniques consists of building a model for text representation. The model known as vector space model, VSM, is the most well-known and used among these techniques. However, some difficulties and limitations of VSM, such as scalability and sparsity, motivate the proposal of alternative approaches. This paper proposes a keyword extraction method for tweet collections that represents texts as graphs and applies centrality measures for finding the relevant vertices (keywords). To assess the performance of the proposed approach, three different sets of experiments are performed. The first experiment applies TKG to a text from the Time magazine and compares its performance with that of the literature. The second set of experiments takes tweets from three different TV shows, applies TKG and compares it with TFIDF and KEA, having human classifications as benchmarks. Finally, these three algorithms are applied to tweets sets of increasing size and their computational running time is measured and compared. Altogether, these experiments provide a general overview of how TKG can be used in practice, its performance when compared with other standard approaches, and how it scales to larger data instances. The results show that TKG is a novel and robust proposal to extract keywords from texts, particularly from short messages, such as tweets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Mathematics and Computation - Volume 240, 1 August 2014, Pages 308–325
نویسندگان
, ,