Figure 4.1 illustrates a block diagram of automatic pronunciation instruction as part of the CAPL system. For all the phonemes that are detected as pronunciation errors, we isolate the most stable frames of their segments using the HMM segmenter, and perform the feedback instruction using the vowel diagram for vowel sounds and the classification of the place and manner of articulation for consonant sounds, respectively. We adopt different feedback strategies for consonant and vowel, because their articulation is completely different.
For vowel errors, we directly measure the formant frequency
, and give feedback by plotting it on the articulatory
vowel diagram mapped onto the open/close position on the jaw and
front/back on the tongue. One central frame of corresponding segment
is extracted and measured for its
by the formant
measurement module, and then formant frequency
is plotted
on the vowel diagram to analyze how these vowels were articulated.
For consonant errors, the central (three) frames of its phonemic segment are input to the Pair-Wise classifiers. one preceding frame, one central frame and one following frame, a total of three are extracted to capture dynamic features. Then, the pattern is classified into the place of articulation and the manner of articulation, respectively. For instance, the phoneme /sh/ should be classified as Post-alveolar for the place of articulation and Fricative for the manner of articulation (see Table 2.1). If the classified result does not match the category of the corresponding phoneme, it identifies the cause of articulation problem.