Article ID Journal Published Year Pages File Type
4944824 Information Sciences 2017 40 Pages PDF
Abstract
Similarity search (or similar entity search) is the process of finding all entities similar to a given entity (e.g., a person, a document, or an image). Although many techniques for similarity analysis have been proposed in the past, little work has been done on the question of which of the presented techniques are most suitable for a given similarity search task. Knowing the right similarity function is important as the task is highly domain- and data-dependent. In this article, we provide an approach for recommending which similarity functions (e.g., edit distance or jaccard similarity) should be used for measuring the similarity between two entities. The approach employs an incremental knowledge acquisition technique for capturing domain experts' knowledge about similarity functions and their usage contexts (e.g., entity class, attribute name and some keywords). In addition, for situations where domain experts have little or no knowledge about datasets, we analyze the features of the datasets and then suggest similarity functions based on the identified features. We also demonstrate the feasibility and effectiveness of our proposed approach on several real-world datasets from different domains.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,