کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
423091 685171 2006 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fast and Flexible Compression for Web Search Engines
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Fast and Flexible Compression for Web Search Engines
چکیده انگلیسی

In this paper we present the adaptation of a compression technique, specially designed to compress large textual databases, to the peculiarities of web search engines.The (s,c)-Dense Code belongs to a new category of compression techniques [Silva de Moura, E., G. Navarro, N. Ziviani and R. Baeza-Yates, Fast and flexible word searching on compressed text, ACM Transactions on Information Systems 18 (2000), pp. 113–139; Brisaboa, N., A. Fariña, G. Navarro and M. Esteller, (s,c)-dense coding: An optimized compression code for natural language text databases, in: Proc. 10th International Symposium on String Processing and Information Retrieval (SPIRE 2003), LNCS 2857, 2003, pp. 122–136] that allows fast and flexible search directly on compressed files. However these methods are only suitable for large natural texts containing at least 1 megabyte, otherwise they would not achieve an attractive amount of compression.In order to take advantage of the search capabilities of these techniques (they allow searches on compressed files up to eight times faster than searching on the plain versions [Silva de Moura, E., G. Navarro, N. Ziviani and R. Baeza-Yates, Fast and flexible word searching on compressed text, ACM Transactions on Information Systems 18 (2000), pp. 113–139]), we present a modification of the basic compression technique (s,c)-Dense Code to achieve reasonable compression ratios with small files, a requirement when we work with search engines.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Electronic Notes in Theoretical Computer Science - Volume 142, 3 January 2006, Pages 129-141