A generalized framework for anaphora resolution in Indian languages

Article ID	Journal	Published Year	Pages	File Type
4946543	Knowledge-Based Systems	2016	13 Pages	PDF

Abstract

In this paper, we propose a joint model of feature selection and ensemble learning for anaphora resolution in the resource-poor environment like the Indian languages. The proposed approach is based on multi-objective differential evolution (DE) that optimises five coreference resolution scorers, namely Muc, Bcub, Ceafm, Ceafe and Blanc. The main goal is to determine the best combination of different mention classifiers and the most relevant set of features for anaphora resolution. The proposed method is evaluated for three leading Indian languages, namely Hindi, Bengali and Tamil. Experiments on the benchmark datasets of ICON-2011 Shared Task on Anaphora Resolution in Indian Languages show that our proposed approach attains good level of accuracies, which are often better with respect to the state-of-the-art systems. It achieves the F-measure values of 71.89%, 59.61%, 52.55% 34.45% and 72.52% for Muc, Bcub, Ceafm, Ceafe and Blanc, respectively, for Bengali language. For Hindi we obtain the F-measure values of 33.27%, 63.06%, 49.59%, 49.06% and 55.45% for Muc, Bcub, Ceafm, Ceafe and Blanc metrics, respectively. In order to further show the efficacy of our proposed algorithm, we evaluate with Tamil, a language that belongs to a different family. This shows the F-measure values of 31.79%, 64.67%, 46.81%, 45.29% and 52.80% for Muc, Bcub, Ceafm, Ceafe and Blanc metrics, respectively. Experiments on Dutch show the F-measure values of 17.67%, 74.43%, 58.08%, 59.21% and 55.58% for Muc, Bcub, Ceafm, Ceafe and Blanc metrics, respectively.

Keywords

Conditional Random Field (CRF)Multiobjective optimization (MOO)Support vector machine (SVM)