Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4951780 | Science of Computer Programming | 2017 | 12 Pages |
Abstract
In this paper, a parallel set similarity join method is introduced using the MapReduce programming model. The proposed method uses Locality Sensitive Hashing (LSH) techniques to decrease the number of required comparisons for calculating the similarity of the sets. The performance of the proposed method has been compared with the best previous similarity join methods on real and synthetic datasets in terms of time. The experimental results show that the proposed method works faster than the former methods.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Mohammad Karim Sohrabi, Hosseion Azgomi,