3.4.2 Scoring

The pronunciation scoring algorithms are based on phonemic time alignments generated by the speech recognition system. In this application, the transcription of the utterance is known because the learner is prompted to utter a word or sentence from the system. By using the alignments and the native-trained HMMs, the system computes various scores that rely on the phoneme-level statistics. Our pronunciation scoring method uses the HMMs trained using the database of native speakers to generate phonetic time alignments of the learner's speech. It is a good measure of the similarity between model speech and learners' speech. From the above segmentations, we use the following probability measures to obtain scores for each phoneme segment. For each segment, the HMM log-likelihood score S is calculated as

where tex2html_wrap_inline3960 is the likelihood of the current frame with observation vector , d is the duration (in frames) of the phoneme segment, and tex2html_wrap_inline3966 is the starting frame index of the phoneme segment. Dividing by d allows us to eliminate the dependency of the pronunciation score on the duration of the phoneme, since the HMM log-likelihood score has the property to depend on the length of observation vectors, i.e., the longer it is, the lower the score is. An example of these scores is shown on the bottom window in Figure 3.3. The /silB/, /silE/ symbols are used to make the model more robust by absorbing the noises in the beginning and end part of utterance. In this case, it is realized that the learner's /sh/ score is relatively lower than the model's score.

Next: 3.5 Pronunciation Error Detection Up: 3.4 Automatic Segmentation and Previous: 3.4.1 Segmentation

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999