Human Judgement

The basic assumption behind the automatic scoring algorithms proposed in this study is that one should be able to evaluate the quality of non-native learners' speech by acoustic scores computed with a speech recognizer trained on native speech. Based on the scoring result, we looked into their inveterate pronunciation errors and found three types of mispronunciation.

(1): Learners are not accustomed to pronounce voiceless sounds such as /ch/ and /ts/ after /N/ or between vowels (TYPE 7 and TYPE 8).
(2): /s/ shows large difference between native speakers and non-native learners, as it is generally known that non-native speakers tend to pronounce /s/ for /sh/ or /sh/ for /s/ (TYPE 2).
(3): Learner H made very poor score for /j/. It is confirmed that he mispronounced /j/ in ojigi and jitsuryoku, respectively (TYPE 3).

Through the experiment, inveterate pronunciation errors were successfully detected by means of local/global thresholds. We found that this method showed better correlation with human judgement than any other method.

Apart from eight types of pronunciation, it is noteworthy that long vowels such as /u:/, /e:/, and /o:/ show large degradation in their scores as shown in Figure 3.5 and Figure 3.6. We found a very interesting phenomenon : non-native speakers pronounce it as a diphthong, whereas many native speakers do it as a long vowel, e.g., [shourai] instead of [sho:rai] for the word shourai. In modern Japanese, not a few diphthong is pronounced as a long vowel regardless of the orthographic notation : e.g., [ei] [e:], [ou] [o:], etc[25].

Next: 3.6.3 Task 3 : Up: 3.6.2 Task 2 : Previous: Automatic Scoring

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999