Article ID Journal Published Year Pages File Type
6939148 Pattern Recognition 2018 33 Pages PDF
Abstract
Natural Language Processing plays a key role in man-machine interactions, allowing computers to understand and analyze human language. One of its more challenging sub-domains is word sense disambiguation, the task of automatically identifying the intended sense (or concept) of an ambiguous word based on the context in which the word is used. This requires proper feature extraction to capture specific data properties and a dedicated machine learning solution to allow for the accurate labeling of the appropriate sense. However, the pattern classification problem posed here is highly challenging, as we must deal with high-dimensional and multi-class imbalanced data that additionally may be corrupted with class label noise. To address these issues, we propose a local ensemble learning solution. It uses a one-class decomposition of the multi-class problem, assigning an ensemble of one-class classifiers to each of the distributions. The classifiers are trained on the basis of low-dimensional subsets of features and a kernel feature space transformation to obtain a more compact representation. Instance weighting is used to filter out potentially noisy instances and reduce overlapping among classes. Finally, a two-level classifier fusion technique is used to reconstruct the original multi-class problem. Our results show that the proposed learning approach displays robustness to both multi-class skewed distributions and class label noise, making it a useful tool for the considered task.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,