In order to train an HMM, we must optimize a and b with respect to the HMM's likelihood of generating all of the output sequences in the training set, because this will maximize the HMM's chances of also correctly recognizing new data. Unfortunately, there is no closed form solution. The best that can be done is to start with some initial values for a and b, and then to iteratively modify a and b by reestimating and improving them, until some stopping criterion is reached. An instance of this general method is the Forward-Backward Algorithm.
We define , as the probability of generating the remainder of the sequence , starting from state j at time t. can be computed recursively in a backward direction (see Figure 2.6):
図 2.6: Backward pass recursion in HMM
This recursion is initialized at time T by setting to 1.0 for the final state, and 0.0 for all other states. We define as the probability of transitioning from state i to state j at time t , given that the whole output sequence has been generated by the current HMM:
The numerator in the final equality can be understood by consulting Figure 2.7. The denominator reflects the fact that the probability of generating equals the probability of generating while ending up in any of k final states.
図 2.7: Deriving in the Forward-Backward Algorithm
We define as the expected number of times that the transition from state i to state j is taken, from time 1 to T:
Summing this over all destination states j, we obtain , or N(i), which represents the expected number of times that state i is visited, from time 1 to T:
Selecting only those occasions when state i emits the symbol u, we obtain N(i,u):
Finally, we can reestimate the HMM parameters a and b, yielding and , by taking simple ratios between these terms:
It can be proven that substituting for a, b always causes to increase, up to a local maximum. Thus, by repeating this procedure for a number of iterations, the HMM parameters are optimized for the training data, and generalize well to the test data.