3.2.2 Assessment Strategy

   figure770
図 3.1: Relationship among a non-native speaker, a model speaker and speaker-independent phoneme models

In this system, two aspects of both perceptual and mathematical evaluation are considered to assess the learner's pronunciation in the initial phase. Namely, the HMM (Hidden Markov Model) likelihood score serves as an objective judgement (mathematical measure), whereas the visualized acoustic features work as the perceptual reference (perceptual measure). This relationship is shown in Figure 3.1. However, the focus of our discussion will be on the mathematical evaluation rather than the perceptual one.

As we described earlier, automatic scoring methods do not always correlate well with the perceptual judgement due to speaker variability[19]. To solve this problem, we use an automatic scoring method, which relies on an HMM log-likelihood probability normalized by its time duration, in order to assess the quality of pronunciation. And then, we focus on a significant degradation in scores that suggests pronunciation errors by defining threshold functions. The pronunciation errors detected through the above process are also verified by human listeners to determine the reliability of our automatic scoring method[20]. To achieve text-independent pronunciation scoring, extensive speech data was designed and collected.

   figure780
図 3.2: Block diagram of automatic pronunciation assessment

Block diagram for the speech recognition processing performed in the CAPL system is illustrated in Figure 3.2. The basic strategy of automatic scoring on the CAPL system consists of three modules: (1) Acoustic Analysis, (2) Phonemic Segmentation and Scoring, and (3) Pronunciation Error Detection.

First of all, the visualized acoustic features are provided for reference between the model pronunciation and the learner's. To capture the acoustic features of pronunciation, we have performed four kinds of acoustic analysis, which are generally considered as important factors in language learning: Acoustic waveform, LPC spectrum, pitch, and power. Phonemic segmentation is obtained for each target word or sentence using the Viterbi algorithm, along with the corresponding log-likelihood of each segment. It is performed under a given transcription to produce accurate segmentation. The information of phonemic segmentation is also displayed on the screen to help the learner compare his (or her) acoustic features with the model's. Furthermore, for detected pronunciation errors, the feedback instruction is performed based on articulatory configuration in the next phase, which will be discussed in the next chapter.

 

  figure786


図 3.3: Example of the user interface window for the visual representation module


next up previous contents
Next: 3.3 Visual Representation of Up: 3.2 Approach to Automatic Previous: Mathematical Measure

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999