Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Article ID	Journal	Published Year	Pages	File Type
484876	Procedia Computer Science	2015	5 Pages	PDF

Abstract

Speech undergoes various acoustic interferences in natural environment, while many of the applications require an effective way to separate the dominant signal from the interference. In this paper, a Short-time Fourier Transform (STFT) based unsupervised method for single channel speech separation is proposed. It uses the pitch information of the dominant and interfering speakers and then generating a time frequency mask based on the pitch frequencies. Through rigorous objective and subjective evaluations, it is shown that the proposed system is capable of providing better Signal to Noise Ratio (SNR) and Perceptual Evaluation of Speech Quality (PESQ) compared to other related methods available in the literature.