کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
494947 862810 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Incorporating self-organizing map with text mining techniques for text hierarchy generation
ترجمه فارسی عنوان
شامل نقشه خود سازمانی با تکنیک های استخراج متن برای نسل سلسله مراتبی متن
کلمات کلیدی
استخراج متن، نقشه خودمراقبتی، شناسایی موضوع، سلسله مراتب نسل
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Incorporation of topic identification into SOM learning could be beneficial to text categorization task.
• Both lateral and hierarchical expansion during SOM learning were achieved according to criteria based on identified topics.
• The produced text hierarchies outperformed contemporary approaches in quality and performance on text categorization.

Self-organizing maps (SOM) have been applied on numerous data clustering and visualization tasks and received much attention on their success. One major shortage of classical SOM learning algorithm is the necessity of predefined map topology. Furthermore, hierarchical relationships among data are also difficult to be found. Several approaches have been devised to conquer these deficiencies. In this work, we propose a novel SOM learning algorithm which incorporates several text mining techniques in expanding the map both laterally and hierarchically. On training a set of text documents, the proposed algorithm will first cluster them using classical SOM algorithm. We then identify the topics of each cluster. These topics are then used to evaluate the criteria on expanding the map. The major characteristic of the proposed approach is to combine the learning process with text mining process and makes it suitable for automatic organization of text documents. We applied the algorithm on the Reuters-21578 dataset in text clustering and categorization tasks. Our method outperforms two comparing models in hierarchy quality according to users’ evaluation. It also receives better F1-scores than two other models in text categorization task.

This work proposes a scheme to improve the self-organizing map algorithm. This is the overall flowchart of the proposed algorithm. The key ingredients include:1.A novel topic identification scheme.2.Lateral expansion using novel topic incompatibility measure.3.Hierarchical expansion scheme using novel cluster size and topic size criteria.Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 34, September 2015, Pages 251–259
نویسندگان
, , ,