Non-speaker information reduction from Cosine Similarity Scoring in i-vector based speaker verification

Article ID	Journal	Published Year	Pages	File Type
455162	Computers & Electrical Engineering	2015	13 Pages	PDF

Abstract

Cosine similarity and Probabilistic Linear Discriminant Analysis (PLDA) in i-vector space are two state-of-the-art scoring methods in speaker verification field. While PLDA usually gives better accuracy, Cosine Similarity Scoring (CSS) remains a widely used method due to simplicity and acceptable performance. In this domain, several channel compensation and score normalization methods have been proposed to improve the performance. We investigate non-speaker information in cosine similarity metric and propose a new approach to remove it from the decision making process. I-vectors hold a large amount of non-speaker information such as channel effects, language, and phonetic content. This type of information increases the verification error rate and hence it should be removed from the scoring method. To this end we propose a method that estimates non-speaker information between two i-vectors using the development set and subtracts it from cosine similarity. The results indicate that the proposed method performed better than other implemented methods based on the cosine similarity. Furthermore, in certain cases the performance of this method was better than the PLDA method and when combined with PLDA performance was improved in most cases.

Keywords

Speaker verification Cosine similarity i-vector