Training (The Forward-Backward Algorithm)

In order to train an HMM, we must optimize a and b with respect to the HMM's likelihood of generating all of the output sequences in the training set, because this will maximize the HMM's chances of also correctly recognizing new data. Unfortunately, there is no closed form solution. The best that can be done is to start with some initial values for a and b, and then to iteratively modify a and b by reestimating and improving them, until some stopping criterion is reached. An instance of this general method is the Forward-Backward Algorithm.

We define tex2html_wrap_inline3718 , as the probability of generating the remainder of the sequence tex2html_wrap_inline3720 , starting from state j at time t. can be computed recursively in a backward direction (see Figure 2.6):

図 2.6: Backward pass recursion in HMM

This recursion is initialized at time T by setting to 1.0 for the final state, and 0.0 for all other states. We define as the probability of transitioning from state i to state j at time t , given that the whole output sequence tex2html_wrap_inline3702 has been generated by the current HMM:

The numerator in the final equality can be understood by consulting Figure 2.7. The denominator reflects the fact that the probability of generating tex2html_wrap_inline3702 equals the probability of generating while ending up in any of k final states.

図 2.7: Deriving in the Forward-Backward Algorithm

We define as the expected number of times that the transition from state i to state j is taken, from time 1 to T:

Summing this over all destination states j, we obtain tex2html_wrap_inline3794 , or N(i), which represents the expected number of times that state i is visited, from time 1 to T:

equation460

Selecting only those occasions when state i emits the symbol u, we obtain N(i,u):

Finally, we can reestimate the HMM parameters a and b, yielding and tex2html_wrap_inline3814 , by taking simple ratios between these terms:

equation472

It can be proven that substituting for a, b always causes to increase, up to a local maximum. Thus, by repeating this procedure for a number of iterations, the HMM parameters are optimized for the training data, and generalize well to the test data.

Next: Phoneme Models Up: 2.3.2 Hidden Markov Models Previous: Segmentation (The Viterbi Algorithm)

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999