2.4 The CAPL System Overview

One of the development goals of the CAPL system is to integrate a range of resources and expertise from speech recognition to pedagogical design in order to afford automatic pronunciation learning. Another goal is to exploit state-of-the-art advances in speech recognition technology for the purpose of automatic pronunciation scoring of learner's speech, which provides feedback validated by human expert raters. The schematic diagram of the CAPL is illustrated in Figure 2.11.

図 2.11: Schematic diagram of the CAPL system

The system was constructed consisting of several modules for training two aspects of pronunciation: segmental part and suprasegmental part. Our emphasis has been on segmental rather than suprasegmental domain. This does not mean that the segmental problems are more important, only that they are theoretically better well-established, whereas extensive work is required on the suprasegemntal aspects. The system includes traditional training methods as discussed in section 1.1.3, and furthermore affords the novel methods that we will discuss it in details in the next three chapters: automatic pronunciation scoring and error detection (chapter 3), articulation instruction (chapter 4), and speech rhythm measurement and error detection (chapter 5), respectively.

The non-native learners of Japanese involved in the experiment are listed in Table 2.3. The learners were selected according to different sets of variables, such as language background, age, and gender. Also, for their speeches, the experiment was performed under the acoustic feature extraction conditions shown in Table 2.4.

ID M/F AGE NATIVE-BORN COUNTRY JAPANESE

A M 24 Korea 1 YR.

B M 28 China 1 YR.

C M 26 Taiwan 1 YR.

D M 26 France 3 YRS.

E M 28 Canada 2 YRS.

F M 19 Kazakstan 4 YRS.

G M 34 Indonesia 3 YRS.

H M 36 Kenya 3 YRS.

表 2.3: Non-native learners of Japanese involved in the experiments

**表 2.3:** Non-native learners of Japanese involved in the experiments
ID	M/F	AGE	NATIVE-BORN COUNTRY	JAPANESE
A	M	24	Korea	1 YR.
B	M	28	China	1 YR.
C	M	26	Taiwan	1 YR.
D	M	26	France	3 YRS.
E	M	28	Canada	2 YRS.
F	M	19	Kazakstan	4 YRS.
G	M	34	Indonesia	3 YRS.
H	M	36	Kenya	3 YRS.

(JAPANESE means how long they have been learning Japanese as L2.)

SAMPLING FREQUENCY 16 kHz, 16 Bits

WINDOW LENGTH, TYPE 25 msec, Hamming

FRAME PERIOD 10 msec

表 2.4: Acoustic analysis conditions

**表 2.4:** Acoustic analysis conditions
	SAMPLING FREQUENCY	16 kHz, 16 Bits
	WINDOW LENGTH, TYPE	25 msec, Hamming
	FRAME PERIOD	10 msec

Next: 2.4.1 Component Modules Up: 2 Linguistic Phonetics and Previous: Pair-Wise Discriminant Functions

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999