Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
571040	1446522	2016	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

SVM classifier - طبقه بندی SVM Stylometric features - ویژگی های استیلومتری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

چکیده انگلیسی

Social media text is generally informal and noisy but sometimes tends to have informative content. Extracting these informative content such as entities is a challenging task. The main aim of this paper is to extract entities from Malayalam social media text efficiently. The social media corpus used in our system is from FIRE2015 entity extraction task. This data is initially subjected to pre-processing and feature extraction and then proceeds with entity extraction. Apart from the conventional stylometric features like prefixes, suffixes, hash tags etc., and POS tags, unsupervised word embedding features obtained from Structured Skip-gram model are utilized to train the system. The extracted features is given to the Support vector machine classifier to build and train model. Testing of the system resulted in better accuracy than the existing systems evaluated in FIRE2015 tasks. Unsupervised features retrieved using Structured Skip-gram model contributes to the reason for achieving better performance.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 93, 2016, Pages 547–553

نویسندگان

G. Remmiya Devi, P.V. Veena, M. Anand Kumar, K.P. Soman,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

دسترسی سریع

ارتباط

English Website