Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6854705 | Expert Systems with Applications | 2018 | 28 Pages |
Abstract
In this work we experiment with a wide range of phonetic algorithms for the English language. The aim of this study is to determine the best phonetic algorithms within the context of candidate generation for microtext normalization. In other words, we intend to find those algorithms that taking as input non-standard terms to be normalized allow us to obtain as output the smallest possible sets of normalization candidates which still contain the corresponding target standard words. As it will be stated, the choice of the phonetic algorithm will depend heavily on the capabilities of the candidate selection mechanism which we usually find at the end of a microtext normalization pipeline. The faster it can make the right choices among big enough sets of candidates, the more we can sacrifice on the precision of the phonetic algorithms in favour of coverage in order to increase the overall performance of the normalization system.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Yerai Doval, Manuel Vilares, Jesús Vilares,