Article ID Journal Published Year Pages File Type
241944 Advanced Engineering Informatics 2014 9 Pages PDF
Abstract

•The best performing set of MFCC parameters for dysarthric speech was studied.•A speaker-independent dysarthric ASR model based on ANNs is proposed.•The ASR systems trained by mel cepstrum with 12 coefficients provided the best accuracy.•The proposed speaker-independent ASR model provided 68.38% word recognition rate.•The highest word recognition rate of the speaker-dependent ASR systems was 95%.

Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often physically incapacitated. Mel-Frequency Cepstral Coefficients (MFCCs) have been proven to be an appropriate representation of dysarthric speech, but the question of which MFCC-based feature set represents dysarthric acoustic features most effectively has not been answered. Moreover, most of the current dysarthric speech recognisers are either speaker-dependent (SD) or speaker-adaptive (SA), and they perform poorly in terms of generalisability as a speaker-independent (SI) model. First, by comparing the results of 28 dysarthric SD speech recognisers, this study identifies the best-performing set of MFCC parameters, which can represent dysarthric acoustic features to be used in Artificial Neural Network (ANN)-based ASR. Next, this paper studies the application of ANNs as a fixed-length isolated-word SI ASR for individuals who suffer from dysarthria. The results show that the speech recognisers trained by the conventional 12 coefficients MFCC features without the use of delta and acceleration features provided the best accuracy, and the proposed SI ASR recognised the speech of the unforeseen dysarthric evaluation subjects with word recognition rate of 68.38%.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,