Article ID Journal Published Year Pages File Type
4977865 Speech Communication 2016 10 Pages PDF
Abstract
This paper discusses phonemic syllabification using a pseudo nearest neighbour rule (PNNR) and phonotactic knowledge for Indonesian language. The proposed data-driven model uses a four-feature phoneme encoding and a phonotactic-based pre-syllabification. Evaluating on 50 k words dataset using 5-fold cross-validation shows that the proposed encoding significantly reduces the average syllable error rate (SER) by 13.90% relatively to the commonly used orthogonal binary encoding and the pre-syllabification also reduces the average SER up to 17.17% relatively to the PNNR without pre-syllabification. Five-fold cross-validating proves that the proposed PNNR-based syllabification is stable by producing an average SER of 0.64%. Most errors come from derivatives with the prefixes 'ber', 'per', and 'ter' as well as from compound words. This result is also significantly lower than a Look-Up-based syllabification that gives an average SER of 2.60%.
Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , ,