Synching models with infants: a perceptual-level model of infant audio-visual synchrony detection

Article ID	Journal	Published Year	Pages	File Type
10321083	Cognitive Systems Research	2005	24 Pages	PDF

Abstract

Synchrony detection between different sensory channels appears critically important for learning and cognitive development. In this paper we compare infant studies of audio-visual synchrony detection with a model of synchrony detection based on Gaussian mutual information [Hershey, J., & Movellan, J. (2000). Audio-vision: using audio-visual synchrony to locate sounds. In S. A. Solla, T. K. Leen, & K. R. Müller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 813-819). Cambridge, MA: MIT Press], augmented with methods for quantitative synchrony estimation. Five infant-model comparisons are presented, using stimuli covering a broad range of audio-visual integration types. While infants and the model showed discrimination of each type of stimuli, the model was most successful with stimuli comprised of (a) synchronized punctuate motion and speech, (b) visually balanced left and right instances of the same person talking but speech synchronized with only one side, and (c) two speech audio sources and a dynamic-face motion source. More difficult for the model were stimuli conditions with (d) left and right instances of two different people talking but speech synchronized with only one side, and (e) two speech audio sources and more abstract visual dynamics - an oscilloscope instead of a face. As a first approximation, this model of synchrony detection using low-level sensory features (e.g., RMS audio, grayscale pixels) is a candidate for a mechanism used by infants in detecting audio-visual synchrony.