کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558469 874934 2012 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A segmental non-parametric-based phoneme recognition approach at the acoustical level
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
A segmental non-parametric-based phoneme recognition approach at the acoustical level
چکیده انگلیسی

Although Hidden Markov Models (HMMs) are still the mainstream approach towards speech recognition, their intrinsic limitations such as first-order Markov models in use or the assumption of independent and identically distributed frames lead to the extensive use of higher level linguistic information to produce satisfactory results. Therefore, researchers began investigating the incorporation of various discriminative techniques at the acoustical level to induce more discrimination between speech units. As is known, the k-nearest neighbour (k-NN) density estimation is discriminant by nature and is widely used in the pattern recognition field. However, its application to speech recognition has been limited to few experiments. In this paper, we introduce a new segmental k-NN-based phoneme recognition technique. In this approach, a group-delay-based method generates phoneme boundary hypotheses, and an approximate version of k-NN density estimation is used for the classification and scoring of variable-length segments. During the decoding, the construction of the phonetic graph starts from the best phoneme boundary setting and progresses through splitting and merging segments using the remaining boundary hypotheses and constraints such as phoneme duration and broad-class similarity information. To perform the k-NN search, we take advantage of a similarity search algorithm called Spatial Approximate Sample Hierarchy (SASH). One major advantage of the SASH algorithm is that its computational complexity is independent of the dimensionality of the data. This allows us to use high-dimensional feature vectors to represent phonemes. By using phonemes as units of speech, the search space is very limited and the decoding process fast. Evaluation of the proposed algorithm with the sole use of the best hypothesis for every segment and excluding phoneme transitional probabilities, context-based, and language model information results in an accuracy of 58.5% with correctness of 67.8% on the TIMIT test dataset.


► A simplified acoustical-level phoneme recognition algorithm through a combination of segmentation, classification, and boundary compensation steps.
► Phoneme segmentation algorithm is based on the spectral energy of speech and the modified group-delay function.
► Phoneme classification algorithm is based on a non-parametric classification scheme.
► Applying feedback modules to compensate for the missed boundaries and over-segmentations of the segmentation stage.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 26, Issue 4, August 2012, Pages 244–259
نویسندگان
, ,