A Cognitive Science Approach to the Problem of Dialogue:
Roles of language information, speech information, and visual
information in understanding of the dialogue partner
Jun-ichi ABE
Department of Behavioral Science,
Faculty of Letters,
Hokkaido University
Sapporo 060, JAPAN
e-mail: abe@hubs.hokudai.ac.jp
In order to engage successfully in dialogue, we need to consider
the dialogue partner and his/her situation. Let us suppose a
face-to-face dialogue situation. Information about the partner is
usually mediated mainly by auditory and visual sensory modalities.
Using the auditory modality, we obtain the partner's speech sound
information, and from that information we extract the partner's
language information. Using the visual modality, we obtain the
partner's facial and bodily information.
We believe the following questions should be asked for clarifying
the dialogue participant's cognitive process of understanding of
his/her partner. 1) What part of the auditory and visual information
is taken as cues for understanding? 2) Using these cues, what kind
of inference is made and how much detailed is the inference?
The present paper reports a part of our experimental research
program which attempts to answer the above mentioned questions.
{Method}
Stimulus materials and design of the experiment
From TV and movies we recorded many scenes in which a cast is
speaking toward the camera with his or her upper half of the body
being pictured. From these recorded scenes, we chose 14 scenes (of 5
to 7 sec long) in which a speech is continuous and without a
background music. These scenes were loaded on a computer to edit the
contents of the scenes as follows.
{Procedure}
The subjects were asked to write down on a response sheet what
they judge (infer) about the speaker. The question items used are as
follows: 1) What do you think the speaker feels when he is speaking.
How could you judge the speaker's feeling? 2) What is your judgment
on the speaker's character? How could you judge the speaker's
character? 3) What would you guess the context in which this scene
took place? How were you able to guess so?
The subject was required to make his or her judgment on the above
three questions at each presentation trial and was required to record
each of the judgments on the response sheet. At each occasion, the
experimenter recorded the time taken for the subject to complete the
response.
{Subject}
We recruited 28 undergraduate students at Hokkaido University.
Four subjects were allocated randomly to each of the 7 types of the
presentation conditions.
{Results and Discussion}
The data were standardized and were submitted to the 2-way
analysis of variance (7 presentation conditions * 14 scenes).
With regard to the analysis of the number of characters written,
the presentation condition yielded a significant main effect. The
main effect of scene was also significant. However, the interaction
was not significant.
Results of the multiple comparison showed significant differences
between the following presentation conditions: condition 3 (picture
only) versus condition 6 (picture + voice + caption), condition 4
(voice only) versus condition 1 (picture + voice), that versus
condition 6 (picture + voice + caption), that versus condition 7
(voice + caption), condition 5 (caption only) versus condition 1
(picture + voice), that versus condition 6 (picture + voice +
caption), and that versus condition 7 (voice + caption).
We expected the number of characters in response has a monotonic
increasing relation to the amount of information received. The
results confirmed this expectation: The number of characters responded
was greater in conditions 1 (picture + voice) and 6 (picture + voice +
caption), but fewer in conditions 3 (picture only), 4 (voice only) and
5 (caption only).
It is noteworthy that there was a significant difference between
condition 4 (voice only)and condition 7 (voice + caption). This
result was not expected. We speculated that adding captions to voice
would not increase the amount of information per se. However, adding
captions did contribute in increasing the number of characters written
on a response sheet. We believe that this effect was caused by having
the subject read the caption to facilitate his/her conceptual and
schematic knowledge activation. The subject, thus, could have
accomplished various inferences in a short amount of time.
As a result of analysis of response times, we found significant
main effects of the presentation conditions and of the scenes. The
interaction, however, was not significant. Result of the multiple
comparison showed significant differences between the following
conditions: condition 4 versus condition 6, condition 5 versus
condition 1, that versus condition 2, and that versus condition 6.
The difference of the response times between presentation conditions
showed almost the same fashion as that of the numbers of characters on
a response sheet. That is, the condition with greater amount of
information tend to give rise to longer response times and vice versa.
The results, as a whole, suggest the uniqueness of written
language as a source of information. We predicted that there is no
difference between condition 6 (picture + voice + caption) and
condition 1 (picture + voice) as far as the actual amount of
information goes. However, both the response time and the response
amount were greater in condition 6 than in condition 1. The results
suggest that the visual language information in the form of characters
has a potential to strongly activate the linguistic knowledge and the
schematic knowledge.
Keywords: dialogue, cognitive processes, understanding, psychological experiment