Thesaurus Construction and Analysis Method for Dialogue Understanding Subtitle: Thesaurus Construction for Dialogue Understanding

Hiroaki TSURUMARU, Hideyuki MAEDA, Kazuhiro YAMAMOTO

Department of Electrical Engineering and Computer Science, Nagasaki University

1-14 Bunkyou-machi, Nagasaki 852, Japan

e-mail: turumaru@ec.nagasaki-u.ac.jp

A thesaurus consisting of the semantic relations between words such as hyponym or part-whole relation is very important for the dialogue understanding, which is one of the approximate models of common sense. However these relations between words in the existing thesaurus published for men's use are insufficient for natural language processing. This study aims to clarify the method for constructing a thesaurus based on these semantic relations between words, and intends to approach the problems of ellipses and anaphores in dialogue. The results of the current year's study are mainly as follows: (1) We have been working the word knowledge acquisition from on-line Japanese language dictionary (MRD), and developing the computer-aided thesaurus construction, i.e. the thesaurus system. By using the results of these works, a pilot thesaurus has been made on an experimental basis, which has about 80,000 entry words of noun. (2) The pilot thesaurus includes about 12,000 words (called local maximal words as a matter of convenience) such as have no connection with the superordinates. Then we have proposed a way for assigning the superordinates to the local maximal words by means of the synonymous words. The experimental results show about 60\% in the local maximal words are indirectly connected with the superordinates. (3) The part-whole relation is useful to presume some kind of the omitted information in dialogue. But only hundreds of entry words have part-whole relation in our pilot thesaurus. Then we have proposed three types of logical definitions of part-whole relation, and a logical method for deducing the new words-pairs of part-whole relation from the given words-pairs of part-whole and others relation.

Keywords: thesaurus, semantic dictionary, word knowledge base, conceptual hierarchy, dialogue understanding, natural language processing