دانلود رایگان مقاله: بررسی اثر و درمان نمایش داده شد اشتباه در صلیب زبان بازیابی اطلاعات

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
514949	866917	2016	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval

ترجمه فارسی عنوان

بررسی اثر و درمان نمایش داده شد اشتباه در صلیب زبان بازیابی اطلاعات

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

نمایش داده شدگان بازیابی اطلاعات متقابل زبان؛ ترجمه ماشین تصحیح املا؛ شخصیت n-gram ها

Cross-Language Information Retrieval - بازیابی اطلاعات متقابل زبان machine translation - ترجمه ماشین Spelling correction - تصحیح املا

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش مقاله

بررسی اثر و درمان نمایش داده شد اشتباه در صلیب زبان بازیابی اطلاعات

چکیده انگلیسی

• We study the effects of misspelled queries on the performance of CLIR systems.
• Word-based approaches (as both indexing and translation units) are highly sensitive to the presence of misspellings.
• The use of correction mechanisms can significantly reduce their negative effects.
• Classical techniques are suitable for shorter queries while context-based corrections are suitable for longer queries.
• Our approach based on character n-grams (as both indexing and translation units) shows remarkable strength.

In contrast with their monolingual counterparts, little attention has been paid to the effects that misspelled queries have on the performance of Cross-Language Information Retrieval (CLIR) systems. The present work makes a first attempt to fill this gap by extending our previous work on monolingual retrieval in order to study the impact that the progressive addition of misspellings to input queries has, this time, on the output of CLIR systems. Two approaches for dealing with this problem are analyzed in this paper. Firstly, the use of automatic spelling correction techniques for which, in turn, we consider two algorithms: the first one for the correction of isolated words and the second one for a correction based on the linguistic context of the misspelled word. The second approach to be studied is the use of character n-grams both as index terms and translation units, seeking to take advantage of their inherent robustness and language-independence. All these approaches have been tested on a from-Spanish-to-English CLIR system, that is, Spanish queries on English documents. Real, user-generated spelling errors have been used under a methodology that allows us to study the effectiveness of the different approaches to be tested and their behavior when confronted with different error rates. The results obtained show the great sensitiveness of classic word-based approaches to misspelled queries, although spelling correction techniques can mitigate such negative effects. On the other hand, the use of character n-grams provides great robustness against misspellings.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 52, Issue 4, July 2016, Pages 646–657

نویسندگان

Jesús Vilares, Miguel A. Alonso, Yerai Doval, Manuel Vilares,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : بررسی اثر و درمان نمایش داده شد اشتباه در صلیب زبان بازیابی اطلاعات

دسترسی سریع

ارتباط

English Website