Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
553419 | Decision Support Systems | 2015 | 12 Pages |
•The order of identity features weightage for effective identity matching: Passport > Name > Postal Address > Date of birth.•The q-gram length=4 produced the best performance result. Also, increasing the q-gram length will increase efficiency.•Pruning q-gram blocks from the index that contained more than 10% of the total records produced best performance result.•Overall, the proposed approach exhibits better performance as compared to the adaptive detection identity matching technique.
Information overload is a growing problem for information management and analytics in many organizations. Identity matching techniques are used to manage and resolve millions of identity records in diverse domains such as health care information, telecom subscribers, insurance holders, law offenders, and the census. In this paper, we propose an identity matching technique that is efficient for large datasets without compromising matching effectiveness. Our experimental results provide strong evidence that our proposed identity matching technique outperforms the adaptive detection identity matching technique in terms of efficiency and effectiveness, reducing the number of required comparisons by almost 98% and the completion time by 97%, with promising scalability results. Furthermore, our proposed technique achieves better matching results than the most trusted pairwise identity matching approach.