Speech Recognition Mechanism of Second Language Learners in Short-Term Memory
Kazuko TOJO
Department of English Literature, Kyushu Women's University
Yahatanishi-ku, Kitakyushu, Fukuoka 807 JAPAN
In an effort to realize an effective man-machine interaction, the
analysis of human dialogue in a natural environment renders useful
insights. What is no less important in the area of human dialogue is
the study of communicative behavior in one's second language. This is
especially the case where its acquisition process is expected to
reveal some important aspects in each developmental phase. Among the
so-called four skills of language proficiency (reading, writing,
speaking and listening), listening skill has direct relevance to the
treatment of speech recognition. Few studies deal with the learner's
speech recognition process because of its latent and receptive nature.
The purpose of this study is to clarify the speech recognition
mechanism of second language learners by analyzing how Japanese
learners of English in various stages of listening proficiency
recognize spoken English sentences and increase their comprehension.
{Framework of study}
In comprehending a spoken English sentence, provided the
linguistic elements such as semantics and syntax are under the
learner's control, speech rate (speed) and the length of the sentence
play a crucial role. The faster and longer sentence poses more
difficulty in comprehension especially when the learner's speech
recognition skill is not sufficient. Here, short-term memory plays a
very important role and imposes the learner with a finite capacity in
auditory information processing. 7+/-2 is a widely known number that
a person can store in one's short-term memory in terms of any
perceptual information unit (Miller, 1956). If a word in a sentence
is taken as a unit, the proposed number would fail to legitimatize the
fact that a person can process with ease much longer sentences in a
natural environment. To solve the seeming contradiction, the
following are hypothesized.
(1) A word is regarded as the smallest information unit.
(2) Speech recognition is restricted by a finite human channel capacity imposed
by short-term memory.
(3) When the number of the units reaches the upper limit, more information will
be obtained by expanding the capacity within each unit.
For example, the sentence "Why/ don't/ you/ meet/ me/ at/ 10/
o'clock/ in/ front/ of/ the/ library" contains 13 words which
presumably go beyond the proposed number of words. But if the words
are recognized in chunks as "Why don't you meet me/ at 10 o'clock/ in
front of the library," the number of units is reduced to 3 and hence
the burden of storing them in short-term memory is greatly minimized.
This, in other words, increases the efficacy of auditory information
processing.
{Experiment}
In order to embody the latent and receptive process of speech
recognition, an elicited repetition task was given to the subjects.
This is an appropriate task to examine the learner's channel capacity
observed in the overlap between input (a taped sentence) and output
(the subject's reproduction). The subjects chosen for the study were
80 Japanese college students learning English. 30 English sentences
were sampled from natural conversational American English. The number
of the words in a sentence varied from 3 to 17 words. Vocabulary,
grammar and the topic of the sentences were carefully controlled so
that it would not pause any difficulties to the subjects. The test
sentences were tape-recorded by a native speaker (an American female).
The test which took 6 minutes was conducted in a language laboratory
and the subjects' reproductions were tape-recorded. The recorded
reproductions were scored by the tester according to the number of
words correctly reproduced. The acceptability criterion was set by
the minimum recognizability necessary to retrieve the original words,
as pronunciation accuracy being of secondary importance.
{Results and Discussion}
The following results were obtained.
(1) Mechanical responses to the given sentences were found. The subjects
reproduced the beginning of the sentence most successfully, then the end. The
middle section of the sentence was the first to be dropped. Their
characteristic responses to the sections of the sentence were confirmed with
test sentence variations where an adverbial phrase was moved from the middle or
the end section to the beginning of the sentence respectively. Comparison of
the results for the variations indicated the subjects' responses were
determined by the location of the words in the sentence rather than the meaning
or function of the words.
(2) As the sentences got longer, some of the subjects started to respond only
to the end section of the sentence.
(3) The reproduction errors appeared characteristically with the more
successful subjects. Some errors were observed as the result of semantic
reprocessing and the others as syntactic reprocessing.
(4) The most successful subjects' reproductions were characterized with a
longer reproduction span.
It is safe to presume, therefore, that speech recognition is
clearly controlled by a limitation posed by short-term memory when the
subject's language proficiency is not sufficient. The fact that some
of the subjects responded only to the end section as the sentences got
longer is regarded as one of the strategies used to cope with the
limitation. Better performance is realized by recognizing words in a
longer span, which proves the hypothesis that the amount of
information is increased by expanding the capacity of each unit.
Also, speech recognition is more successfully implemented when the
processings take place at multiple levels of semantic and syntactic
areas simultaneously, rather than at a phonetic level alone.
Deployment of various linguistic knowledge can back up the accurate
speech recognition.
The findings regarding the speech recognition mechanism of second
language learners are meaningful for a successful man-machine
interaction. Some of the limitations a person is posed by short-term
memory is a dimension that should be taken into account in a
man-machine dialogue modeling.
Keywords: speech recognition, short-term memory, information unit, language acquisition