کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515353 866998 2015 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Bridging the vocabulary gap between questions and answer sentences
ترجمه فارسی عنوان
شکاف واژگان میان سوالات و پاسخ جملات را برهم زده
کلمات کلیدی
بازیابی عبارات، مدل سازی زبان، خوشه بندی کلمه، راه اندازی سوال پاسخ دادن
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We introduce two novel LM-based models to relax the exact matching assumption in IR.
• The class-based model clusters words to provide a coarse-grained word representation.
• The trigger model captures pairs of trigger and target words to find word relationships.
• Different types of word co-occurrence and triggering are studied within the models.
• We further studied the combination of both models to achieve the best result.

We propose two novel language models to improve the performance of sentence retrieval in Question Answering (QA): class-based language model and trained trigger language model. As the search in sentence retrieval is conducted over smaller segments of text than in document retrieval, the problems of data sparsity and exact matching become more critical. Different techniques such as the translation model are also proposed to overcome the word mismatch problem. Our class-based and trained trigger language models, however, use different approaches to this aim and are shown to outperform the exiting models. The class model uses word clustering algorithm to capture term relationships. In this model, we assume a relation between the terms that belong to the same clusters; as a result, they can be substituted when searching for relevant sentences. The trigger model captures pairs of trigger and target words while training on a large corpus. The model considers a relation between a question and a sentence, if a trigger word appears in the question and the sentence contains the corresponding target word. For both proposed models, we introduce different notions of co-occurrence to find word relations. In addition, we study the impact of corpus size and domain on the models. Our experiments on TREC QA collection verify that the proposed model significantly improves the sentence retrieval performance compared to the state-of-the-art translation model. While the translation model based on mutual information (Karimzadehgan and Zhai, 2010) has 0.3927 Mean Average Precision (MAP), the class model achieves 0.4174 MAP and the trigger model enhances the performance to 0.4381.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 51, Issue 5, September 2015, Pages 595–615
نویسندگان
, ,