Information Theoretic Modeling and Development of a Speech Dialog System

Yasuhisa NIIMI and Yutaka KOBAYASHI

Department of Electronics and Information Science, Kyoto Institute of Technology

Matsugasaki, Sakyo-ku, Kyoto 606, Japan

E-mail: niimi@dj.kit.ac.jp

The purpose of our study is to model and develop a speech dialog system based on the idea that the exchange of information between a speaker and a dialog system can be viewed as the communication through a noisy channel, because speech recognition errors are inevitable at the current of the art. We investigated the dialog control mechanism for such a system and a method for spotting nonsense words like 'anoh' and 'ehtto' which are often inserted in spontaneous utterances. We proposed two dialog control strategies to facilitate speech recognition. The first strategy is based on experiments in which it has been observed that a human changes his speaking style depending on that of his partner, a simulated dialog system. Using this behavior pattern, we can make the speaker change his speaking style unconsciously so that the system can recognize his voice better. The second strategy is based on the prediction of what the speaker will say next. For the dialog system whose task is to give the speaker information on sightseeing in Kyoto, we have developed a top-down discourse analysis to anticipate the speaker's next utterance. The top-down hypotheses on what the speaker will say next have been proved very effective in reducing the search space. We tried to spot nonsence words with recurrent neural networks. The word-spotting was performed for five classes of nonsense words, which include over 80\% of nonsence words actually observed in spontaneous utterances. A network was trained to fire only on the words belonging to one of these five classes. The average detection rate and false alarm were 83\% and 44\% respectively.

Keywords: dialog modeling, dialog control strategy, nonsense word spotting, recurrent neural network.