In order to train an HMM, we must optimize a and b with respect to
the HMM's likelihood of generating all of the output sequences in the
training set, because this will maximize the HMM's chances of also
correctly recognizing new data. Unfortunately, there is no closed
form solution. The best that can be done is to start with some initial
values for a and b, and then to iteratively modify a and b by
reestimating and improving them, until some stopping criterion is
reached. An instance of this general method is the Forward-Backward
Algorithm
.
We define , as the probability of generating the remainder
of the sequence
, starting from state j at time
t.
can be computed recursively in a backward direction
(see Figure 2.6):
図 2.6: Backward pass recursion in HMM
This recursion is initialized at time T by setting to
1.0 for the final state, and 0.0 for all other states. We define
as the probability of transitioning from state i
to state j at time t , given that the whole output sequence
has been generated by the current HMM:
The numerator in the final equality can be understood by consulting
Figure 2.7. The denominator reflects the fact that the
probability of generating equals the probability of generating
while ending up in any of k final states.
図 2.7: Deriving in the Forward-Backward Algorithm
We define as the expected number of times that the transition from state i to state j is taken, from time 1 to T:
Summing this over all destination states j, we obtain , or N(i), which represents the expected number of
times that state i is visited, from time 1 to T:
Selecting only those occasions when state i emits the symbol u, we obtain N(i,u):
Finally, we can reestimate the HMM parameters a and b, yielding
and
, by taking simple ratios between
these terms:
It can be proven that substituting
for a, b always causes
to increase, up to a local
maximum. Thus, by repeating this procedure for a number of
iterations, the HMM parameters are optimized for the training data,
and generalize well to the test data.