Article ID Journal Published Year Pages File Type
382202 Expert Systems with Applications 2016 9 Pages PDF
Abstract

•A decision support algorithm for record clustering in databases is proposed.•Capacity limitation problem is introduced to make a general clustering application.•Rule extraction from datasets is realized by the proposed evolutionary algorithm.•Rule clustering considering capacity limitation is solved by knapsack problem.•The simulations of record clustering show some advantages of the proposed method.

This research involves implementation of genetic network programming (GNP) and standard dynamic programming to solve the knapsack problem (KP) as a decision support system for record clustering in distributed databases. Fragment allocation with storage capacity limitation problem is a background of the proposed method. The problem of storage capacity is to distribute sets of fragments into several sites (clusters). Total amount of fragments in each site must not exceed the capacity of site, while the distribution process must keep the relation (similarity) between fragments within each site. The objective is to distribute big data to certain sites with the limited amount of capacities by considering the similarity of distributed data in each site. To solve this problem, GNP is used to extract rules from big data by considering characteristics (value ranges) of each attribute in a dataset. The proposed method also provides partial random rule extraction method in GNP to discover frequent patterns in a database for improving the clustering algorithm, especially for large data problems. The concept of KP is applied to the storage capacity problem and standard dynamic programming is used to distribute rules to each site by considering similarity (value) and data amount (weight) related to each rule to match the site capacities. From the simulation results, it is clarified that the proposed method shows some advantages over the conventional clustering algorithms, therefore, the proposed method provides a new clustering method with an additional storage capacity problem.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,