A Computational Model of Man-Machine Dialogue

Shuji DOSHITA, Masahiro ARAKI, and Tatsuya KAWAHARA

Department of Information Science, Kyoto University
Sakyo-ku, Kyoto 606-01, Japan
e-mail: doshita@kuis.kyoto-u.ac.jp

{1. Introduction}
We propose a dialogue model that reflects two important aspects of spoken dialogue system: to be 'robust' and to be 'cooperative'. For this purpose, our model has two main inference spaces: Conversational Space (CS) and Problem Solving Space (PSS). CS is a kind of dynamic Bayesian network that represents a meaning of utterance and general dialogue rule. 'Robust' aspect is treated in CS. PSS is a network so called Event Hierarchy that represents the structure of task domain problems. 'Cooperative' aspect is mainly treated in PSS. In constructing CS and making inference on PSS, system's process, from meaning understanding through response generation, is modeled by dividing into five steps. From our point of view, cooperative problem solving dialogue is regarded as a process of constructing CS and achieving goal in PSS through these five steps.

{2. Outline of Our Approach}
As mentioned above, major problems in constructing SDS are how to deal with uncertainty and how to manage cooperative dialogue.

{2.1 Dealing with Uncertainty}
In SDS, there are various ambiguity and uncertainty of user's input, such as uncertainty of speech recognition results, syntactic and semantic ambiguity, ill-formed utterances and uncertainty of user's intention. Many probabilistic methods are developed for each problems. But in order to deal with various ambiguity and uncertainty by integrated manner, we need a framework of probabilistic reasoning. Then we decide to hire a Bayesian network formalism.
Bayesian network is a kind of probabilistic causal network. Each node represents a random variable, that is a value of a proposition. In this paper, random variable is a binary variable, that is true of false. Each link represents a kind of causal relationship. A certainty measure is assigned to each node that is consistent with the axioms of probability theory. Its computational cost for updating certainty measure is proportional to the longest path in the network. Because Bayesian network propagates evidential message bidirectional, it can deal with multiple evidence inputs. Then Bayesian network is suitable for treating uncertainty in natural language processing.
We regard utterance understanding as dynamic construction of Bayesian network. We call this network Conversational Space (CS). The input of CS is phrase hypothesis that is a result of phrase spotting module. Phrase hypothesis is represented a node with a spotting score as its certainty measure. Some classes of linguistic instances are inferred from this evidence. A proposition that states an existence of instance of conceptual class, utterance type class, action type class is inferred by network expanding procedure. The way of dealing with each uncertainty is shown in section 3. CS is also used for recording a history of dialogue and generating a surface response.

{2.2 Managing Cooperative Dialogue}
We will feel SDS as 'cooperative' if SDS make proper answer and/or good suggestion. In order to generate such response, SDS must recognize user's plan and select proper speech act as system's response. But if system cannot make response until user's plan is recognized, the dialogue does not go smoothly. Then SDS should have a dialogue strategy both cooperative and responsive.
In our model, knowledge of task domain for plan recognition is represented by static network. Main structure of this network is same as Event Hierarchy. It represents relationships between plan and subplans, and between plan and actions. We call this network Problem Solving Space (PSS). We apply minimal covering method for plan recognition in PSS. The basic point of this procedure is to find forest that covers all the subplans and actions previously achieved.
According to the result of plan recognition, we set two processing types. One is 'surface understanding' and the other is 'deep understanding'.
'Surface understanding' is a process of generating response without using information about user's plan or the context of dialogue before user's latest utterance. On the other hand, 'deep understanding' is a process of identifying user's plan, updating mental state and selecting cooperative system's reaction.
The decision which process to choose, surface or deep, is made by the result of plan recognition. If recognized minimal cover includes only one top level plan node (that means minimal cover is a tree), we see user's plan as recognized and deep understanding process is chosen. When minimal cover has several top level plan nodes, we call this situation as competing plans. In this competing situation, system does not have enough information to identify only one user's plan. Then the 'surface understanding' process works for generating immediate response. Surface response is made in CS by the trigram of utterance type.

{2.3 Five Steps Model of Dialogue Processing}
In order to construct CS and to make inference in PSS, the process from understanding user's utterance through generating system's response is divided into five steps. These steps are (1) meaning understanding, (2) intention understanding, (3) communicative effect, (4) reaction generation, and (5) response generation.
Meaning understanding step constructs CS and response generation step compose a surface expression of system's response from the part of CS. Intention understanding step make correspondence utterance type in CS with action in PSS. Reaction generation step selects a cooperative reaction in PSS and expands the reaction to utterance type of CS. The status of problem solving and declared user's preference are recorded in mental state by communicative effect step.

{3. Conclusion}
We present an outline of five steps model of cooperative problem solving dialogue. We show robustness can be implemented by Bayesian network framework. And also we show cooperativeness can be implemented by two level processing: surface understanding and deep understanding. For this purpose, we set Conversational Space and Problem Solving Space. The role of these two spaces are explained by five steps total dialogue modeling.

Keywords: dialogue model, speech understanding, plan recognition, Bayesian network