کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1114740 1488412 2014 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Text Segmentation for Language Identification in Greek Forums
موضوعات مرتبط
علوم انسانی و اجتماعی علوم انسانی و هنر هنر و علوم انسانی (عمومی)
پیش نمایش صفحه اول مقاله
Text Segmentation for Language Identification in Greek Forums
چکیده انگلیسی

In this paper, we examine the benefit of applying text segmentation methods to perform language identification in forums. The focus here is on forums containing a mixture of information written in Greek, English as well as Greeklish. Greeklish can be defined as the use of Latin alphabet for rendering Greek words with Latin characters. For the evaluation, a corpus was manually created, by collecting web pages from Greek university forums and most specifically, pages containing information that combines Greek with English technical terminology and Greeklish. The evaluation using two well known text segmentation algorithms leads to the conclusion that, despite the difficulty of the problem examined, text segmentation seems to be a promising solution.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia - Social and Behavioral Sciences - Volume 147, 25 August 2014, Pages 160-166