A Tool to Solve Sentence Segmentation Problem on Preparing Speech Database for Indonesian Text-to-speech System

Article ID	Journal	Published Year	Pages	File Type
485454	Procedia Computer Science	2016	6 Pages	PDF

Abstract

Creating a training data ready to be used for developing a text-to-speech (TTS) system can be a difficult task, since sometimes the recorded audio data is not the same with the prepared texts. To overcome differences between audio and text data, we developed a tool to segment audio data into sentences. As it is known, doing sentence segmentation of audio data manually needs efforts and resources. This paper presents a solution for alleviating problems encountered during segmentation process of audio data for developing an Indonesian TTS system. The tool was developed based on a fact that bahasa Indonesia is a syllable-timed language. We found that our tool reduces resources needed for segmenting Indonesian audio data.

Keywords

TTS Training data