Estimation of binaural speech intelligibility using machine learning

Article ID	Journal	Published Year	Pages	File Type
5010774	Applied Acoustics	2018	9 Pages	PDF

Abstract

We proposed and evaluated a speech intelligibility estimation method for binaural signals. The assumption here was that both the speech and competing noise are directional sources. In this case, when the speech and noise are located away from each other, the intelligibility generally improves since the auditory system can segregate these two streams. However, since intelligibility tests as well as its estimation is conducted based on monaurally-recorded signals, this potential increase in the intelligibility due to the segregation of sources is not accounted for, and the intelligibility is often under-estimated. Accordingly, in order to estimate the intelligibility taking into account this binaural advantage, we trained a mapping function between the subjective intelligibility and objective measures that account for the binaural advantage stated above. We attempted SNR calculation on (1) a simple binaural to monaural mix-down, which models the conventional estimation, (2) simple pooling of both binaural channels (pooled channel), (3) channel signal selection with the better SNR from left and right channels (better-ear), and (4) sub-band wise better-ear selection (band-wise better-ear). For the mapping function training, we tried neural networks (NN), support vector regression (SVR), and random forests (RF), and compared these to simple logistic regression (LR). We also investigated the sub-band configuration that gives the best estimation accuracy by balancing the frequency resolution and the amount of training data. It was found that the combination of the better-ear model and RF gave the best results, with root mean square error (RMSE) of about 0.11 and correlation of 0.92 in an open set test.

Keywords

Speech intelligibility Machine learning