Choosing the best dictionary for Cross-Lingual Word Sense Disambiguation

Article ID	Journal	Published Year	Pages	File Type
403496	Knowledge-Based Systems	2015	11 Pages	PDF

Abstract

•Selection of the best dictionary for Cross-Lingual Word Sense Disambiguation tasks.•Potential improvements offered by automatically built dictionaries in ideal systems.•Performance of different dictionaries on a particular unsupervised CLWSD system.•Approach for outperforming other systems participating in CLWSD tasks.

The choice of the dictionary that provides the possible translations a system has to choose when performing Cross-Lingual Word Sense Disambiguation (CLWSD) is one of the most important steps in such a task. In this work, we present a comparison between different dictionaries, in two different frameworks. First of all, a technique for analysing the potential results of an ideal system using those dictionaries is developed. The second framework considers the particular unsupervised CLWSD system CO-Graph, and analyses the results obtained when using different bilingual dictionaries providing the potential translations. Two different CLWSD tasks from the 2010 and 2013 SemEval competitions are used for evaluation, and statistics from the words in the test datasets of those competitions are studied. The conclusions of the analysis of dictionaries on a particular system lead us to a proposal that substantially improves the results obtained in that framework. In this proposal a hybrid system is developed, by combining the results provided by a probabilistic dictionary, and those obtained with a Most Frequent Sense (MFS) approach. The hybrid approach also outperforms the results obtained by other unsupervised systems in the considered competitions.

Keywords

Parallel Corpus Natural Language Processing Word Sense Disambiguation