3.1 Introduction

This work presents a method for assessing the quality of pronunciation and detecting pronunciation errors as a part of an effort aimed at developing a Computer-Assisted Pronunciation Learning (CAPL) system for non-native learners of Japanese. The goal of this research is to investigate the feasibility of applying speech recognition based on Hidden Markov Models (HMMs) into L2 (second language) pronunciation learning.

First, it may be useful to ask the question, ``What is a good pronunciation?''. Perhaps, nine people out of ten cannot answer this question easily. This is because it is actually apt to be regarded as one of the characteristics to identify people in the same language culture, e.g., speed, tone, intonation, volume, etc. However, in second language (L2), the term fluency is commonly used. It suggests that there is general agreement as to the precise meaning of good pronunciation. As a rule we would say good or poor about the pronunciation of non-native speakers on the basis of (standard) target language. But, unfortunately, we have no exact set of norms to define it precisely so far.

A number of studies addressed the problem of automatically rating non-native learners by providing measurements that can be correlated with human judgment. However, many previous works lacked reliable assessment for the quality of pronunciation because they used the general speech processing techniques that were not designed with the goal of speech quality assessment. Therefore, there were disparities between the automatic scoring and human ratings. For this purpose, we needed to devise new methods and algorithms to match human judgment in assessing the quality of pronunciation. Recently, the automatic assessment of pronunciation based on Hidden Markov Models (HMMs) has shown good results[17]. Recent advances in speech recognition technology have made our approach to automatic pronunciation assessment possible.

Our pronunciation scoring uses Hidden Markov Models (HMMs) to generate phonemic segmentation of the learner's speech. From these segments, machine scores are obtained based on HMM log-likelihoods. The scores are computed using native acoustic models, and they represent the degree of match between the native models and non-native learners. The effectiveness of each machine score is verified based on correlation with human judgement. In this work, we also investigate techniques for detecting pronunciation errors on the basis of machine scores. We developed threshold functions allowing us to detect phonemes with pronunciation error. Since a high reliability of automatic assessment is requisite and essential to be a practical CAPL system, we have investigated their reliability for three tasks: Specific Mistakes (M-set), Linguistic Disparity (P-set), and Prosodic Transfer (T-set).

In this chapter, we address two points: 1) measurements that can be correlated well with human perception, and 2) thresholds that can detect pronunciation errors.


next up previous contents
Next: 3.2 Approach to Automatic Up: 3 Automatic Pronunciation Assessment Previous: 3 Automatic Pronunciation Assessment

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999