Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4945141 | Information Systems | 2017 | 12 Pages |
Abstract
With the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
L. Sánchez, Javier Alfonso-Cendón, Tiago Oliveira, JoaquÃn B. Ordieres-Meré, Manuel Castejón Limas, Paulo Novais,