Abstract

This thesis describes a Computer-Assisted Pronunciation Learning (CAPL) system to develop native-like pronunciation of non-native learners in a second language (L2). Traditionally, most L2 training focuses on reading, writing, and listening comprehension. Much less effort is dedicated to develop pronunciation because it is sometimes considered less critical for communicating in L2, or simply because of a lack of resources such as native or near-native speakers of the target language.

However, unlike vocabulary and grammar skills, it is almost impossible to acquire native-like pronunciation by self-study. In general, non-native learners are not able to perceive the similarities and differences between their native language (L1) and the target language (L2), especially in cases where phonemes do not exist in their own native language. In addition, it is extremely difficult to alter pronunciation, once acquired, because it is quite persistent. As a consequence, it is crucial to monitor critical errors of a learner's pronunciation and give an appropriate instruction at the early stage of pronunciation learning. Therefore, there is a strong demand for Computer-Assisted Language Learning (CALL), especially for the acquisition of natural pronunciation.

To be effective as a system, two aspects of pronunciation should be considered: the segmental aspects of the speech - consonants and vowels, and the suprasegmental aspects - rhythm, stress, and intonation. First, it is essential to gain an understanding of how the speech sounds of the target language are produced. Such systematic knowledge will enable non-native learners to acquire native-like pronunciation of L2. Furthermore, it is also a prerequisite to institute the necessary feedback procedures for the correction of learners' pronunciation problems. Secondly, to develop fluent and natural pronunciation of L2, it is necessary for the learner to be trained reguarding the suprasegmental aspects of proper pronunciation, which are essential for the production of connected speech.

We selected Japanese as a target language, and designed and implemented our CAPL system in conjunction with a linguistic expert and language teachers who worked together to contribute their expertise in phonetics and instruction skills. The primary goal of this research was to demonstrate the feasibility of a system based on speech recognition technologies in the CALL field. We have integrated a set of speech signal processing and recognition algorithms into a pronunciation instruction system. The CAPL system consists of modules of training aspects for consonant/vowel quality and speech rhythm. In this thesis, our novel methods and algorithms are presented as follows:

Pronunciation Error Detection using Statistical Threshold Scores
Self-Corrective Feedback based on the Classification of Place and Manner of Articulation
Speech Rhythm Training using Rhythm Pattern Templates

The use of speech recognition techniques is a key to solve many problems. However, standard speech recognition algorithms are not designed for pronunciation instruction. We have devised an automatic pronunciation assessment method to match the agreement between automatic scoring and human ratings, and for pronunciation problems we have afforded feedback methods based on the place and manner of articulation for consonant sounds and the vowel diagrams for vowel sounds, respectively. Furthermore, we have provided training for the suprasegmentals (prosodic features).

The formal organization of this thesis is as follows.

Chapter 1 provides the background and motive of this research and the fundamental problems of traditional language learning, and advantages of our CAPL over traditional learning methods, and gives technical considerations to design and implement it.

Chapter 2 gives a brief discussion of phonetics and speech recognition techniques which are directly related to our CAPL system discussed throughout the thesis. Current phonetics is divided into articulatory phonetics and acoustic phonetics. The former considers how any given speech sound is produced in terms of the positions of vocal organs, and the latter put its emphasis on observable, measurable characteristics in the waveforms of speech sounds. They provide a theoretical basis of our approaches to be incorporated with the speech recognition techniques for the CAPL system. Finally, an overview of our CAPL system is introduced.

Chapter 3 introduces the automatic pronunciation assessment method using Hidden Markov Models (HMM). The automatic scoring based on HMM log-likelihood score is used to assess non-native learners' pronunciation. However, the conventional methods of automatic assessment are not always correlated well with human ratings. We focus on the great degradation in scores that suggests pronunciation errors. To detect pronunciation errors, the threshold functions that were calculated from the means and standard deviations of native speakers' speech database are used. The experiment was performed to detect pronunciation errors by a mistake (M-set), a linguistic disparity (P-set), and prosodic transfer (T-set). The correlation between detected errors and human judgement shows the reliability of our methods.

Chapter 4 provides the explanation of automatic pronunciation instruction based on the classification of place and manner of articulation. Most non-native learners cannot figure out how to correct their articulation even if critical errors of their pronunciation are pointed out. Specifically, the feedback method should be considered for the errors that result from linguistic gaps (L1 and L2) in pronunciation. As self-corrective feedbacks, we plot the formant frequency of learners' vowel sound on the vowel diagram mapped onto typical tongue position, and classify the frames of their consonant sound into the categorical articulation on the place and manner of articulation in order to identify the cause of the learners' articulation problems.

Chapter 5 introduces a training method for mora-timed speech rhythm using rhythm pattern templates. A loosely-defined template on the presence or absence of the voicing and a tightly-defined template on the manner of articulation are used to develop the mora rhythm of learners effectively. Standard rhythm ranges were calculated from the means and standard deviations of mora rhythms measured from the speech of native speakers. In this system, specifically, each mora rhythm is relatively measured by the ratio of its length to the word length, not by millisecond. The experiment was performed to detect speech rhythm errors of learners using rhythm pattern templates. The effectiveness of the rhythm pattern templates is discussed by introducing an example of training.

The final chapter, Chapter 6, discusses the feasibility of applying speech recognition techniques to the CAPL system, and future works.

Next: もくじ Up: No Title Previous: No Title

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999