Phoneme Models

In order to analyze the speech frames for their acoustic content, a set of acoustic models is necessary. There are many possible acoustic models, e.g., word and sentence, but our model is based on phoneme. Every phoneme is modeled by a sequence of trainable states, and each state indicates the sub-sounds that are one of the features in that segment of the phoneme, using a probability distribution over the acoustic space. Probability distributions can be modeled parametrically by assuming that they have a simple shape (e.g, a Gaussian distribution) or non-parametrically by prepresenting the distribution directly (e.g., Vector Quantization codebooks).

The acoustic models based on continuous density HMM are used in the CAPL system. We have trained the acoustic models with the ASJ speech databases of phonetically-balanced sentences (ASJ-PB) and newspaper article texts (ASJ-JNAS). In total, around 20K sentences uttered by 132 speakers were available for male speakers. The set of 43 Japanese phones are listed in Table 2.2. The phoneme notation is defined by Acoustical Society of Japan (ASJ) committee on speech database. Here, the symbols /a:/,/i:/,/u:/,/e:/,/o:/ stand for long vowels and the symbol /q/ for a double consonant. Three pause models, /silB/, /silE/ and /sp/, are introduced for pauses at the beginning, at the end of utterances and between words, respectively.

 

 

a i u e o a: i: u: e: o: N w y
p py t k ky b by d dy g gy ts ch
m my n ny h hy f s sh z j r ry
q sp silB silE
表 2.2: List of Japanese phonemes used in speech recognition

   figure547
図 2.8: A schematic of a left-to-right HMM

   figure552
図 2.9: HMM trellis and the Viterbi path


next up previous contents
Next: 2.3.3 Pair-Wise Classifiers Up: 2.3.2 Hidden Markov Models Previous: Training (The Forward-Backward Algorithm)

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999