A Method on Speech Synthesis for Spoken Dialogue Systems and

Psychological Assessment of the Synthetic Speech

Keikichi HIROSE,

Noboru TAKAHASHI, Nobuaki MINEMATSU, Toru SENOO, Mayumi SAKATA

Department of Electronic Engineering, Faculty of Engineering, University of Tokyo

7-3-1 Hongo, Bunkyo-ku, 113 Tokyo, Japan

e-mail: hirose@gavo.t.u-tokyo.ac.jp

We are developing a technology for generating response sentences appropriate in the dialogue flow and converting them into high-quality and easy-to-understand speech with naturally sounding prosodic features. Concrete research aims are as follows:

1. Generate sentences from response contents in deep-level semantic representation, which may include information on focal position, ellipsis and anaphora. Generated sentences should include high-level linguistic information, such as syntactic and discourse structures, and information on intentions to be transmitted.

2. Synthesize response speech with prosodic features of spoken dialogue. Intended linguistic information and intentions should well be transmitted by the speech.

3. Improve the conventional terminal-analogue synthesis method to obtain synthetic speech with high-quality also from the viewpoint of segmental features. As an example for the spoken dialogue system, we adopted a question and answer system on ski areas. Based on the analysis of dialogue speech using a quantitative model for generating fundamental frequency contours, the prosodic rules for the read speech were modified into those for the spoken dialogue. Para-linguistic factors were also incorporated. A method was further developed which generates appropriate response in the form of deep-level semantic representation for the user's input and converts it into surface sentences. As for the segmental features of speech, a novel configuration was developed for the terminal analogue speech synthesizer producing high-quality speech. An experiment was conducted for the synthesis of response speech. Results of the listening test for synthetic speech indicated the validity of the developed methods though further improvements are necessary.

Keywords: spoken dialogue, response speech, speech synthesis, prosodic rules, fundamental frequency contours, sentence generation, speech synthesizer