Training (The Forward-Backward Algorithm)

In order to train an HMM, we must optimize a and b with respect to the HMM's likelihood of generating all of the output sequences in the training set, because this will maximize the HMM's chances of also correctly recognizing new data. Unfortunately, there is no closed form solution. The best that can be done is to start with some initial values for a and b, and then to iteratively modify a and b by reestimating and improving them, until some stopping criterion is reachedgif. An instance of this general method is the Forward-Backward Algorithmgif.

We define tex2html_wrap_inline3718 , as the probability of generating the remainder of the sequence tex2html_wrap_inline3720 , starting from state j at time t. tex2html_wrap_inline3718 can be computed recursively in a backward direction (see Figure 2.6):

equation339

   figure344
図 2.6: Backward pass recursion in HMM

This recursion is initialized at time T by setting tex2html_wrap_inline3744 to 1.0 for the final state, and 0.0 for all other states. We define tex2html_wrap_inline3586 as the probability of transitioning from state i to state j at time t , given that the whole output sequence tex2html_wrap_inline3702 has been generated by the current HMM:

equation391

The numerator in the final equality can be understood by consulting Figure 2.7. The denominator reflects the fact that the probability of generating tex2html_wrap_inline3702 equals the probability of generating tex2html_wrap_inline3702 while ending up in any of k final states.

   figure400
図 2.7: Deriving tex2html_wrap_inline3586 in the Forward-Backward Algorithm

We define tex2html_wrap_inline3784 as the expected number of times that the transition from state i to state j is taken, from time 1 to T:

equation456

Summing this over all destination states j, we obtain tex2html_wrap_inline3794 , or N(i), which represents the expected number of times that state i is visited, from time 1 to T:

equation460

Selecting only those occasions when state i emits the symbol u, we obtain N(i,u):

equation465

Finally, we can reestimate the HMM parameters a and b, yielding tex2html_wrap_inline3812 and tex2html_wrap_inline3814 , by taking simple ratios between these terms:

equation472

equation483

It can be proven that substituting tex2html_wrap_inline3816 for a, b always causes tex2html_wrap_inline3820 to increase, up to a local maximum. Thus, by repeating this procedure for a number of iterations, the HMM parameters are optimized for the training data, and generalize well to the test data.


next up previous contents
Next: Phoneme Models Up: 2.3.2 Hidden Markov Models Previous: Segmentation (The Viterbi Algorithm)

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999