کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6926002 1448888 2018 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Using semantic similarity to reduce wrong labels in distant supervision for relation extraction
ترجمه فارسی عنوان
با استفاده از شباهت معنایی برای کاهش برچسب های اشتباه در نظارت از راه دور برای استخراج رابطه
کلمات کلیدی
نظارت دور استخراج رابطه، برچسب نادرست ژاکارت معنایی، سی ان ان،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی
Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 54, Issue 4, July 2018, Pages 593-608
نویسندگان
, , , , ,