Creation and Application of Dialogue Corpora

Shuichi ITAHASHI, Kazuyuki TAKAGI, Naoki OGURI, Naoko HOURA

Institute of information Sciences \& Electronics, University of Tsukuba

1-1-1 Tennodai, Tsukuba, Ibaraki, 305 Japan

e-mail: itahashi@milab.is.tsukuba.ac.jp

This paper first describes the necessity and significance of speech corpora, state of the art and plan of speech corpora research; then it considers various factors which should be taken into account when designing speech corpora. There are three levels which should be considered, i. e., a syllable, a word and a sentence. It enumerates several statistics of text material based on phonemes, moras, syllables, words and sentences. It describes that the first step toward the design method of speech corpora will be to compare the statistics of existing speech corpora. Temporal characteristics of utterances of read speech and spoken dialogs were analyzed in order to see some acoustic differences between the read speech and the spontaneous speech. Speech data were segmented into utterance units, each of which is a speech interval bounded by pauses, or an utterance such as interjection, filled pause, response word, false start. Then each utterance unit was labeled with its transcription and utterance category. Firstly, duration of pauses and utterance units were examined. Distribution of pause and utterance unit duration in spoken dialogs was wider than read speech, which reflected various disfluencies of spontaneous speech. Secondly, temporal characteristics of utterance units dynamically change as the dialog develops. Duration of interjection words and filled pauses in topic boundaries was 1.7 times longer than the other place, which indicates duration of interjections and fillers is a sign of topic shift of the dialog. Response time between the client and the agent speaker becomes shorter as the dialog proceeds, indicating that interactions between the participants become active and smooth. Average speech rate was relatively slow in the main part of the dialog, where specific knowledge about the task is exchanged. Then we investigated statistical characteristics of speech texts of the Acoustical Society of Japan Speech Corpus. The corpus is composed of ATR 503 phonetically balanced sentences, 1027 various guide task sentences and 3129 transcribed sentences of simulated dialogues. Among occurrence frequencies of phonemes and moras, joint occurrence frequencies of two moras showed that it would be possible to infer the characteristics of the text by noticing mora pairs of high joint occurrence frequency.

Keywords: speech corpora, database, statistics, utterance unit, temporal characteristics