Article ID Journal Published Year Pages File Type
4951780 Science of Computer Programming 2017 12 Pages PDF
Abstract
In this paper, a parallel set similarity join method is introduced using the MapReduce programming model. The proposed method uses Locality Sensitive Hashing (LSH) techniques to decrease the number of required comparisons for calculating the similarity of the sets. The performance of the proposed method has been compared with the best previous similarity join methods on real and synthetic datasets in terms of time. The experimental results show that the proposed method works faster than the former methods.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,