A Language Analysis Method for Spoken Dialogue Understanding

-- Incremental Word Sense Disambiguation Based on Lexical Cohesion --

Manabu OKUMURA

School of Information Science, Japan Advanced Institute of Science and Technology

Tatsunokuchi, Ishikawa 923-12, Japan

e-mail: oku@jaist.ac.jp

A text is not a mere set of unrelated sentences. Rather, sentences in a text are about the same thing and connected to each other. Lexical cohesion is said to contribute to such connection of the sentences. We call a sequence of words which are in lexical cohesion relation with each other a lexical chain. Lexical chains tend to indicate portions of a text that form a semantic unit. Therefore, lexical chains provide a local context to aid in the resolution of word sense ambiguity. In this paper, we describe how word sense ambiguity can be resolved with the aid of lexical cohesion. During the process of generating lexical chains incrementally, they are recorded in a register in the order of the salience. The salience of lexical chains is based on their recency and length. Since the more salient lexical chain represents the nearby local context, by checking lexical cohesion between the current word and lexical chains in the order of the salience, in tandem with generation of lexical chains, we realize incremental word sense disambiguation based on contextual information that lexical chains reveal. We apply the algorithm to real texts. The average performance is 63.4\%. We think the system's performance is promising for the following reasons: Lexical cohesion is not the only knowledge source for word sense disambiguation and proves to be useful at least as a source supplementary to our earlier framework that used case frames.

Keywords: lexical chain, incremental NLP, local context