کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
11021161 1715033 2018 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation
ترجمه فارسی عنوان
اندازه گیری بی نظیری از شباهت معنایی واژگان چینی با استفاده از مدل گراف همبسته برای تقسیم داستان خبری
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 318, 27 November 2018, Pages 236-247
نویسندگان
, , , , ,