کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382202 660745 2016 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Combination of genetic network programming and knapsack problem to support record clustering on distributed databases
ترجمه فارسی عنوان
ترکیبی از برنامه نویسی شبکه ژنتیکی و مشکل حلقه برای حمایت از ذخیره سازی رکورد در پایگاه داده های توزیع شده
کلمات کلیدی
برنامه نویسی شبکه ژنتیک؛ خوشه بندی پایگاه داده؛ مشکل حلقه ثبت خوشه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• A decision support algorithm for record clustering in databases is proposed.
• Capacity limitation problem is introduced to make a general clustering application.
• Rule extraction from datasets is realized by the proposed evolutionary algorithm.
• Rule clustering considering capacity limitation is solved by knapsack problem.
• The simulations of record clustering show some advantages of the proposed method.

This research involves implementation of genetic network programming (GNP) and standard dynamic programming to solve the knapsack problem (KP) as a decision support system for record clustering in distributed databases. Fragment allocation with storage capacity limitation problem is a background of the proposed method. The problem of storage capacity is to distribute sets of fragments into several sites (clusters). Total amount of fragments in each site must not exceed the capacity of site, while the distribution process must keep the relation (similarity) between fragments within each site. The objective is to distribute big data to certain sites with the limited amount of capacities by considering the similarity of distributed data in each site. To solve this problem, GNP is used to extract rules from big data by considering characteristics (value ranges) of each attribute in a dataset. The proposed method also provides partial random rule extraction method in GNP to discover frequent patterns in a database for improving the clustering algorithm, especially for large data problems. The concept of KP is applied to the storage capacity problem and standard dynamic programming is used to distribute rules to each site by considering similarity (value) and data amount (weight) related to each rule to match the site capacities. From the simulation results, it is clarified that the proposed method shows some advantages over the conventional clustering algorithms, therefore, the proposed method provides a new clustering method with an additional storage capacity problem.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 46, 15 March 2016, Pages 15–23
نویسندگان
, , , ,