کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515683 867069 2011 24 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Managing misspelled queries in IR applications
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Managing misspelled queries in IR applications
چکیده انگلیسی

Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted.A first strategy involves those approaches based on correcting the misspelled query, thus requiring the integration of linguistic information in the system. This solution has been studied from complementary standpoints, according to whether contextual information of a linguistic nature is integrated in the process or not, the former implying a higher degree of complexity.A second strategy involves the use of character n-grams as the basic indexing unit, which guarantees the robustness of the information retrieval process whilst at the same time eliminating the need for a specific query correction stage. This is a knowledge-light and language-independent solution which requires no linguistic information for its application.Both strategies have been subjected to experimental testing, with Spanish being used as the case in point. This is a language which, unlike English, has a great variety of morphological processes, making it particularly sensitive to spelling errors.The results obtained demonstrate that stemming-based approaches are highly sensitive to misspelled queries, particularly with short queries. However, such a negative impact can be effectively reduced by the use of correction mechanisms during querying, particularly in the case of context-based correction, since more classical approaches introduce too much noise when query length is increased. On the other hand, our n-gram based strategy shows a remarkable robustness, with average performance losses appreciably smaller than those for stemming.

Research highlights RHtriangle Stemming-based approaches are highly sensitive to misspelled queries.  RHtriangle Their impact can be effectively reduced by the use of orrection mechanisms.  RHtriangle Context-based correction is particularly effective.  RHtriangle More classical correction approaches introduce much more noise.  RHtriangle n-Gram based indexing shows a remarkable robustness.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 47, Issue 2, March 2011, Pages 263–286
نویسندگان
, , ,