Article ID Journal Published Year Pages File Type
566075 Speech Communication 2011 9 Pages PDF
Abstract

This paper proposes new interfaces using semi-synchronous speech and pen input for mobile environments. A user speaks while writing, and the pen input complements the speech so that recognition performance will be higher than with speech alone. Since the input speed and input information are different between the two modes, speaking and writing, a time lag always exists between them. Therefore, conventional multi-modal recognition algorithms cannot be directly applied to this interface. To tackle this problem, we developed a multi-modal recognition algorithm that can handle this asynchronicity (time-lag) by using a segment-based unification scheme and a method of adapting to the time-lag characteristics of individual users. Five different pen-input interfaces, each of which is assumed to be given for a phrase unit in speech, were evaluated in speech recognition experiments using noisy speech data. The recognition accuracy of the proposed method was higher than that of speech alone in all five interfaces. We also carried out a subjective test to examine the usability of each interface. We found a trade-off between usability and improvement in recognition performance.

Research highlights► Interface for semi-synchronous speech and pen input for mobile environments. ► Five kinds of pen gestures are examined. ► We solve the problem of asynchonisity between speech and pen-input. ► We confirmed its effectiveness under noisy conditions.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , , , ,