3.3 Visual Representation of Acoustic Features

Figure 3.3 shows a typical example of the user interface window for the visual representation module, in this case the learner who has been studying Japanese for one year practicing the target word [shurui] (kind or type). The model acoustic information taken from a native speaker is displayed on the left-hand side, and his acoustic information on the right-hand side, respectively. The four subwindows are (from top to bottom) the acoustic waveform window, LPC spectrogram window, pitch window, and power window.

A sequence of speech samples is shown on the acoustic waveform window, and there is significant variation in the peak amplitude of the signal when the excitation changes between voiced and unvoiced speech. The feature of acoustic waveform is used to compare the rhythm, speed at which speech is uttered. The spectrogram clearly shows a different pattern of resonances called formants with frequency (from 0 to 8 kHz) on the vertical axis and time on the horizontal axis. The intensity of any given frequency component at any time is indicated by the darkness on the corresponding spot. The pitch represents the accent of speech. On the pitch plot window, it is easy to recognize that his word accent is slightly flatter. The power signifies the volume (dB) of speech.

By means of the visual and optional auditory feedback given in the module, the efficient training of both perception and production of L2 was viable. The visual feedback was very instructive and pedagogical because it was logical, motivating and shown in real-time. As shown in Figure 3.3, the left portion shows the correctly produced utterance of native model speaker, and the right portion shows the deviant production of a non-native learner. The contrast of the visual patterns gave the learner possibilities to discriminate between various distinctive features that underlie phonological contrasts in Japanese. The purpose of this visual representation module is to help learners to find important cues about their errors by allowing them to visually compare their utterances with a standardized model, instantaneously[21].


next up previous contents
Next: 3.4 Automatic Segmentation and Up: 3 Automatic Pronunciation Assessment Previous: 3.2.2 Assessment Strategy

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999