کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
391987 664584 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Effectively classifying short texts by structured sparse representation with dictionary filtering
ترجمه فارسی عنوان
به طور صحیح طبقه بندی متون کوتاه با نمایش ساختاری اسپارتی با فیلتر کردن فرهنگ لغت
کلمات کلیدی
طبقه بندی متن کوتاه، نمایندگی انحصاری، گروه اسپارتی، فیلتر کردن فرهنگ لغت
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• Structured sparsity is introduced to STC, which solves the sparse feature problem.
• A more compact dictionary is constructed to reduce data correlation and redundancy.
• The new dictionary boosts both classification performance and efficiency.
• Experiments over 5 corpora show that our method outperforms traditional STC methods.
• Experiment also shows that our method is better in exploiting external sources.

Short text classification (STC) has attracted increasing interest recently with the rapid growth of Web and social media data existing in short text form. It is a more challenging task than traditional text classification (TC) because of the feature sparsity of the processed short texts, which makes the state of the art TC approaches perform poorly on short texts if being applied straightforwardly. Existing STC approaches deal with the sparse problem mainly by enriching text content with outer corpora or additional information. Though better performance can be obtained, the performance heavily relies on the amount and quality of outer or additional information. What is worse, such outer or additional information is not always available, not to mention the high cost for acquiring such information. In this paper, we introduce a structured sparse representation classifier to effectively classify short texts, and develop an effective approach called convex hull vertices selection to reduce data correlation and redundancy of the dictionary (the set of training texts), which thus substantially boosts STC efficiency and performance. To the best of our knowledge, this is the first work that exploits structured sparsity for STC. Experiments over five datasets show that the proposed approach outperforms the state of the art TC methods in classification effectiveness and the traditional SR classifier in both classification effectiveness and classification efficiency. Furthermore, we carry out an experiment to classify short texts expanded by additional content, which indirectly shows that our approach performs better than the existing STC methods that exploit external text sources.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 323, 1 December 2015, Pages 130–142
نویسندگان
, , ,