کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515079 866949 2014 36 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Character n-gram application for automatic new topic identification
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Character n-gram application for automatic new topic identification
چکیده انگلیسی


• We used the character n-gram method to predict topic changes in search engine queries.
• We obtained more successful estimations than previous studies, and made remarkable contributions.
• We compared the character n-gram method with the Levenshtein edit-distance method.
• We analyzed ASPELL, Google and Bing search engines as pre-processed spelling correction methods.
• We conclude that Google could be used as a pre-processed spelling correction method.

The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 50, Issue 6, November 2014, Pages 821–856
نویسندگان
, , ,