کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
482163 | 1446206 | 2007 | 15 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Calculating distances for dissimilar strings: The shortest path formulation revisited
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
چکیده انگلیسی
Fast detection of string differences is a prerequisite for string clustering problems. An example of such a problem is the identification of duplicate information in the data cleansing stage of the data mining process. The relevant algorithms allow the application of large-scale clustering techniques in order to create clusters of similar strings. The vast majority of comparisons, in such cases, is between very dissimilar strings, therefore methods that perform better at detecting large differences are preferable. This paper presents approaches which comply with this requirement, based on reformulation of the underlying shortest path problem. It is believed that such methods can lead to a family of new algorithms. An upper bound algorithm is presented, as an example, which produces promising results.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: European Journal of Operational Research - Volume 177, Issue 1, 16 February 2007, Pages 525-539
Journal: European Journal of Operational Research - Volume 177, Issue 1, 16 February 2007, Pages 525-539
نویسندگان
K. Spiliopoulos, S. Sofianopoulou,