کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942944 1437615 2018 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval
ترجمه فارسی عنوان
یادگیری متمرکز بر Paraphrase برای رتبه بندی برای بازیابی سوالات جواب داده شده متداول دامنه خاص
کلمات کلیدی
پاسخ دادن به سوال؛ بازیابی FAQ ؛ آموزش به رتبه؛ ListNET؛ LambdaMART؛ شبکه های عصبی کانولوشن
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


- We study the potential of supervised learning to rank for FAQ retrieval.
- Supervised models offer performance improvements for this task.
- We explored low-effort paraphrase-based data labeling strategies.
- Paraphrase-based labeling was effective for the best models on two FAQ data collections.
- We make a new FAQ retrieval data set publicly available.

A frequently asked questions (FAQ) retrieval system improves the access to information by allowing users to pose natural language queries over an FAQ collection. From an information retrieval perspective, FAQ retrieval is a challenging task, mainly because of the lexical gap that exists between a query and an FAQ pair, both of which are typically very short. In this work, we explore the use of supervised learning to rank to improve the performance of domain-specific FAQ retrieval. While supervised learning-to-rank models have been shown to yield effective retrieval performance, they require costly human-labeled training data in the form of document relevance judgments or question paraphrases. We investigate how this labeling effort can be reduced using a labeling strategy geared toward the manual creation of query paraphrases rather than the more time-consuming relevance judgments. In particular, we investigate two such strategies, and test them by applying supervised ranking models to two domain-specific FAQ retrieval data sets, showcasing typical FAQ retrieval scenarios. Our experiments show that supervised ranking models can yield significant improvements in the precision-at-rank-5 measure compared to unsupervised baselines. Furthermore, we show that a supervised model trained using data labeled via a low-effort paraphrase-focused strategy has the same performance as that of the same model trained using fully labeled data, indicating that the strategy is effective at reducing the labeling effort while retaining the performance gains of the supervised approach. To encourage further research on FAQ retrieval we make our FAQ retrieval data set publicly available.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 91, January 2018, Pages 418-433
نویسندگان
, ,