The basic assumption behind the automatic scoring algorithms proposed in this study is that one should be able to evaluate the quality of non-native learners' speech by acoustic scores computed with a speech recognizer trained on native speech. Based on the scoring result, we looked into their inveterate pronunciation errors and found three types of mispronunciation.
Through the experiment, inveterate pronunciation errors were successfully detected by means of local/global thresholds. We found that this method showed better correlation with human judgement than any other method.
Apart from eight types of pronunciation, it is noteworthy that long
vowels such as /u:/, /e:/, and /o:/ show large degradation in their
scores as shown in Figure 3.5 and Figure 3.6. We
found a very interesting phenomenon : non-native speakers pronounce it
as a diphthong, whereas many native speakers do it as a long vowel,
e.g., [shourai] instead of [sho:rai] for the word shourai. In
modern Japanese, not a few diphthong is pronounced as a long vowel
regardless of the orthographic notation : e.g., [ei]
[e:], [ou]
[o:], etc[25].