Article ID Journal Published Year Pages File Type
558483 Computer Speech & Language 2011 13 Pages PDF
Abstract

We consider the problem of using speech processing to characterize an aggregate of voice data, in contrast to inferences about individual voice cuts. We derive simple turn-taking models from speaker activity detection output on the Switchboard-1 corpus. These can be used to cluster speakers into turn-taking ‘styles.’ Demographic fields and turn-taking behavior prove to be statistically dependent, thus observed speaker activity improves estimates of the demographics of held-out data. Finally, we use turn-taking style to estimate speaker influence.

► Turn-taking study results from speech activity detection on Switchboard-1. ► Speakers can be clustered via turn-taking model likelihoods. ► Turn-taking styles can be used to estimate data set demographics. ► Speaker influence related to speaker degrees.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,