Thesaurus Construction and Analysis Method for Dialogue Understanding
--Thesaurus Construction for Dialogue Understanding--
Hiroaki TSURUMARU, Hideyuki MAEDA, and AKihiro KAWASHIMA
Department of Electrical Engineering and Computer Science,
Nagasaki University
1-14 Bunkyou-machi, Nagasaki 852, JAPAN
e-mail: turumaru@ec.nagasaki-u.ac.jp
Many ellipses and demonstrative pronouns occur in dialogue.
Generally speaking, the omitted words (or phrases) and the pronominal
references are complemented by the use of common sense and discourse
information. Here it becomes a serious problem for the dialogue
understanding that the definition of common sense is not clear.
The thesaurus consisting of the semantic (hierarchical) relations such
as upper/lower relation or part/whole relation between
words is regarded as an approximate model of the common sense.
There are some thesauri such as ``Bunrui-Goi-Hyo (Word List by Semantic
Principles)'' and ``Roget's Thesaurus''. However, they are not
always sufficient for natural language processing, because they
are mainly for the use of human beings.
This study aims to clarify the method for constructing a thesaurus
based on hierarchical relations such as upper/lower relation
and part/whole relation between the concepts of words, and to approach
to the problems of the application of the thesaurus to dialogue
understanding. Here we regard one of the senses of a word as one
concept. Now, how and from what to obtain these hierarchical
relations is one of the most important problems for constructing the
thesaurus.
We have been studying how to acquire these hierarchical relations
from the definition sentences in the on-line Japanese dictionary,
and developing a programming system for computer-aided thesaurus
construction. The contents of the current year's studies are mainly as
follows: First, we review the algorithm for extracting the hierarchical
relations. Second, we discuss the evaluation of the trial thesaurus
which has been made on an experimental basis through the results of
these works. And third, we also discuss the application of the thesaurus
to presumption of the elliptical words in dialogue.
Now we describe those three topics more specifically.
(1) Concerning the extracting algorithm, we have reviewed it
from a theoretical viewpoint. The basic idea of the extraction
of the hierarchical relations is as follows; generally the definition
sentence contains the core word(s) expressing the central meaning of
the word sense, which we call the definition word(s). Then we extract
the definition word(s) and the relational information, and decide the
semantical relation between the entry word and the definition word.
Here the semantical relations include, as well as upper/lower
relation and part/whole relation, synonymous relation and element/set
relation. We also regard the latter two relations as hierarchical
relation in a wide sense.
(2) Concerning the evaluation of the trial thesaurus, we have
researched on the followings;
(3) Concerning the application of the thesaurus, we have studied
algorithm for the inference of the omitted words or phrases in
dialogue using the thesaurus and IPAL basic verbs dictionary in order
to verify the validity of the trial thesaurus. The outline of the
algorithm is as follows;
This algorithm can be used to handle pronominal and
anaphoric reference.
We have collected the dialogue data for experiments from the texts of
NHK Sequel Basic English(1991).
Keywords : thesaurus, semantic dictionary, word knowledge, conceptual hierarchy