کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558468 874934 2012 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Morphological decomposition in Arabic ASR systems
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Morphological decomposition in Arabic ASR systems
چکیده انگلیسی

In recent years, the use of morphological decomposition strategies for Arabic Automatic Speech Recognition (ASR) has become increasingly popular. Systems trained on morphologically decomposed data are often used in combination with standard word-based approaches, and they have been found to yield consistent performance improvements. The present article contributes to this ongoing research endeavour by exploring the use of the ‘Morphological Analysis and Disambiguation for Arabic’ (MADA) tools for this purpose. System integration issues concerning language modelling and dictionary construction, as well as the estimation of pronunciation probabilities, are discussed. In particular, a novel solution for morpheme-to-word conversion is presented which makes use of an N-gram Statistical Machine Translation (SMT) approach. System performance is investigated within a multi-pass adaptation/combination framework. All the systems described in this paper are evaluated on an Arabic large vocabulary speech recognition task which includes both Broadcast News and Broadcast Conversation test data. It is shown that the use of MADA-based systems, in combination with word-based systems, can reduce the Word Error Rates by up to 8.1% relative.


► The first detailed exploration of MADA-based decomposition techniques in state-of-the-art ASR systems.
► The introduction of a novel method for morpheme-to-word conversion which makes use of N-gram Statistical Machine Translation techniques.
► First evaluation of MADA-derived pronunciation probabilities.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 26, Issue 4, August 2012, Pages 229–243
نویسندگان
, , , ,