Article ID Journal Published Year Pages File Type
6940141 Pattern Recognition Letters 2018 11 Pages PDF
Abstract
This study investigates the use of non-conventional body-conductive acoustic sensors in human-human speech communication and automatic speech recognition. The body-conductive sensors are directly attached to the speaker and receive the uttered speech through the skin and bones, resulting in higher robustness against environmental noise. In this study, a throat microphone, an ear bone microphone, and a standard microphone were evaluated using subjective speech intelligibility tests and automatic speech recognition experiments. In addition to the use of these sensors on their own, several methods were also applied for sensor integration, thereby achieving higher recognition rates. Namely, multi-stream hidden Markov model (HMM) decision fusion, and late fusion methods were used to integrate several sensors. By using late fusion, a 40% relative recognition rate improvement in a noisy environment, and a 24% relative recognition rate improvement in a clean environment were achieved. In the case of late fusion, a novel adaptive weighting method was introduced that does not require any pre-adjustment of the weights. In this study, a technique to automatically segment noisy speech data by using a body-conductive sensor in conjunction with the desired microphone during recording is presented. The Lombard effect phenomenon when using body-conductive acoustic sensors was also investigated.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , ,