Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6951527 | Computer Speech & Language | 2017 | 26 Pages |
Abstract
Word Sense Disambiguation (WSD) is a fundamental task useful for Information Retrieval, Information Extraction, web search, and indexing, among others. In the literature there exist several works dedicated to generic WSD task, but in recent years domain-specific WSD has attracted the attention of several researchers. In this sense, this paper describes an approach for domain-specific WSD by selecting the predominant sense (synset from WordNet) of ambiguous words. To achieve it the method uses two corpora: the domain-specific test corpus (containing target ambiguous words) and a domain-specific auxiliary corpus (obtained by using relevant words from the domain-specific test corpus). The approach has four main stages: (1) auxiliary corpus generation; (2) related features extraction (from the auxiliary corpus); (3) test features extraction (from the test corpus); and (4) features integration. The proposed approach has been tested on domain-specific corpora (Sports and Finance) and on one balanced corpus, BNC. Even though our WSD approach showed some limitations when dealing with the general-domain corpus, the obtained results for domain-specific corpora, which are our main interest, were better than those reported in previous works.
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Ivan Lopez-Arevalo, Victor J. Sosa-Sosa, Franco Rojas-Lopez, Edgar Tello-Leal,