Recognition of Dialog Speech

-- Processing of Spontaneous Speech and Speaker Adaptation --

Seiichi NAKAGAWA,

Mikio YAMAMOTO, Yuji MORIYA, Atsuhiko KAI, Satoshi KOBAYASHI

Department of Information and Computer Sciences, Toyohashi University of Technology

Tempaku-cho, Toyohashi, Aichi, 441, Japan

e-mail: nakagawa@say1gw.tutics.tut.ac.jp

We investigate a recognition method of spontaneous speech and an unsupervised speaker adaptation method, and describe results of analysis of interjections and repairs in real dialog. First, we propose and evaluate a processing method of interjections and unknown words so that a speech recognition system can deal with spontaneous speech in dialog. We have assessed the relationship between the word recognition accuracy of a system and the detection rate of unknown words by the simulation of unknown word processing. We have also evaluated the performance of our speech recognition system. Second, preparing the data to develop a method of processing of interjections and repairs, we inspected interjections and repairs in real spoken Japanese dialogs. Results of analysis of interfections show that some interjections are used very frequently and interjections can appear at the boundaries of Bunsetu(phrase). Results of analysis of repairs show that repeating the same words appears to be more common than any other types of repairs. This type constitutes a little over 30\% of all repair types. Other types contain substitutions and insertions. Third, improving accuracy of speech recognition, we propose and evaluate an unsupervised speaker adaptation method on the sequential concatenation training that uses the theory of MAPE (Maximum A Posteriori probability Estimation) for continuous parameter HMM. The label sequences were provided automatically by the recognizer which worked on a speaker-independent model. The experimental results show that the better initial model gives a comparable performance to that of supervised adaptation.

Keywords: spontaneous speech recognition, unknown word, interjection, repair, unsupervised speaker adaptation