Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6856198 | Information Sciences | 2018 | 15 Pages |
Abstract
Person Name Disambiguation on the Web is the problem of grouping web pages retrieved by a search engine when looking for a person name according to the individual they refer to. This problem has been addressed in a monolingual scenario where all the search results are written in the same language. However, search engines can also return links to web pages written in different languages. We study how to address multilingualism for this problem using the MC4WePS data set, a recent gold standard that includes real search results written in different languages. For this purpose, we first analyze the suitability of using a translation tool to treat multilingualism with two state-of-the-art clustering algorithms. Since the use of this kind of tools increases the processing time of the disambiguation process, we propose an approach to deal with multilingualism that generalizes the monolingual scenario and does not require any translation resources. Our approach obtains better results than the translation approaches with the gold standard, making it a competitive choice in a real scenario.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
AgustÃn D. Delgado, Raquel MartÃnez, Soto Montalvo, VÃctor Fresno,