Various general purpose subsystems exist within CHATR that allow the creation of new (and parameterizing of existing) modules.
How can you get specific data from an utterance in a uniform way? The answer is most likely through a feature function. A feature function takes a stream cell and returns a value (a character string). This mechanism is designed so vectors of parameters can easily and uniformly be extracted from an utterance. For example, suppose you want the vowel type, lexical stress, accent and duration of all syllables. This is the mechanism you should use to do it.
From Lisp you can get sets of vectors for items in a stream
within an utterance. This mechanism should be used to extract
data to use within external training systems--or possibly one
of the training sub-systems described in the following section.
The general Lisp function to use is Feats_Out
. It takes
three or four arguments: an utterance, a stream name, a list of
features to extract, and optionally, a filename to write them
to. For example
(Feats_Out utt1 'Syl '(syl_vowel stress Syl_tobi_accent Syl_dur))
This will return a list of values for each syllable in `utt1'.
A large number of feature functions are already defined in file
`src/chatr/feats.c'. The names and a short description can
be listed from lisp with the command Feats_Out
, with no
arguments.
This function is especially useful in dumping features from PhonoForm utterances. See section PhonoForm Utterance Types, for a full description.
Note some of these functions are very specific to particular types of utterances, and are not all general functions. The core has already been used within the system for various decision trees, duration and intonation systems using linear regression and neural nets.
If you are writing new functions they should be added to the table
in `feats.c'. Care should be taken so they do not
generate garbage (i.e. memory leak). This is not usually a
problem, but if numbers are to be returned, they must be converted
to strings and hence may not be garbage collected. In this rare
case, copy the technique used in Syl_dur
, where the new
string is added to a list of strings to be freed later. Note that
returning numbers is actually quite rare. In most occasions,
quantized versions are quite adequate. Do try to avoid memory
leaks, they can cause much trouble.
Linear regression is used in a number of different places within CHATR, such as intonation, duration, as well as weight training for unit selection. It is also available through a Lisp interface so models can be trained within CHATR. Normally, full experiments would be done outside CHATR in something like S-plus or MATHLAB, but once a model is decided on, it is easier to train within CHATR, especially as data may be much more readily available within the system than externally.
Two Lisp functions provide the interface, one for training and one for testing. Data for training and testing will normally be calculated via the feature functions described above. See section Training a ToBI-Based F0 Prediction Model, for an example of model building using linear regression.
First, collect vectors into a file. A vector consists of a value (to be predicted), and a set of feature values. These should be put in a file, one vector per line, with parenthesis at the start and end. A vector should be readable in one lisp read.
Given such a vector file, and a list of feature names (including a name for the first item in a vector list), use the command
(Linear_Regression_Detail vector_file features_names '2)
The results consist of a list of details about various aspects of the linear regression model built. The third parameter controls the amount of information that is required by the call. The following values are valid
0
1
2
It is recommended that the results from linear regression be checked to find out which features are contributing the most, and also which give little contribution or none at all (i.e. they are fully predictable from other features).
A function called Linear_Regression_Predict
allows the testing
of models, typically against a test set. Like the
Linear_Regression_Detail
described above, this requires a
vector file in the same format. Each vector consists of a value for
prediction (the dependent variable), and the features used to predict
it. Vectors must have parentheses around start and ends. A
`MODEL' is required, consisting of a list of floats, the first
being the intercept, and the rest the weights for each feature
(i.e. as is returned by a level 0 linear regression or extracted from
the more detailed results). An optional third argument to
Linear_Regression_Predict
specifies a file that predicted
values are written to, so external tests may be carried out. The
correlation and statistics for mean error are printed to the screen.
A representation of a linear regression model structure is provided internally, with conversion functions from Lisp, and use within a model.
Before using any of the following functions, you must include
#include "lr.h"
Given a Lisp list of pairs of feature name plus weight, plus a pair
named Intercept
. An optional third value may exist in the
misnamed `pair', this should be a feature map name. Feature maps are
defined in the variable feature_map
and consist of a name and
a list of values. If a feature returns a value in a feature map's
list then its value is treated as 1, else it is assumed to be 0. This
is designed to be able to efficiently deal with category valued
features. An example would best illustrate this. Supposing the we
have a feature that returns the ToBI accent label on syllable.
Things like H* and L* are of course not valid numbers and hence
cannot be used in a linear regression model. But if we use this
feature with a feature map we can use it in an LR model. Suppose we
have the following features maps
(set feature_maps '((tobi_accent_0 H*) (tobi_accent_1 !H*) (tobi_accent_2 L*) (tobi_accent_3 L+H* L+!H* H+!H* L*+!H L*+H)))
We can then in our linear regression model specify pairs as
... (tobi_accent 13.3743 tobi_accent_0) (tobi_accent 12.5658 tobi_accent_1) (tobi_accent -17.0276 tobi_accent_2) (tobi_accent 5.6093 tobi_accent_3) ...
In C we can convert such a Lisp representation of a linear regression model to an internal more efficient one and then use it. This short example illustrates the use.
List lisp_lr_model; LR_Model lr_model; Stream s; lisp_lr_model = list_str_eval("lr_model","no LR model set"); lr_model = make_lr_model(lisp_lr_model); for (s=utt_stream("Segment",utt); s != SNIL; s=SC_next(s)) SC(s,Segment)->duration = lr_predict(s,lr_model);
CHATR includes a subsystem for training, testing and using neural nets. Although it offers some flexibility, it has been specifically designed to implement Campbell's duration theory. Hence the basic structure is fixed. However, a choice in number of inputs, hidden nodes and outputs is available. Each net consists of an input layer, a hidden layer and an output layer. All input nodes are connected to all hidden nodes and all hidden nodes are connected to all output nodes. The nets are used to produce a single value between 0 and 1 (which we understand is a slightly unusual use of neural nets).
Input values must be single digits between 0 and 9. If all inputs for all training data are 0 or 1, the inputs are treated as binary and are rescaled. The actual inputs (and outputs) are rescaled to be greater than 0 but less than 1, however the scaling factors are internal to the implementation, so users need not worry about them. The outputs are linearly scaled from the examples in the training set by the function
O{net} = (O{actual}-Min{out}+epsilon)/(Max{out}-Min{out}+epsilon)
Some other mapping function may be considered to be more appropriate.
A training set should consist of a list of pairs of inputs (a string of digits) and an output value (a float). For example
7710000063600011000110026401 70.0 7101010636300111001110264411 270.0 1010116363301111011112644410 220.0 0104103633311111111116444401 190.0 1041016333011110111114444412 400.0 0410113330011100111104444221 310.0 4100113300011000111014442412 230.0 1000103000310000110104424821 140.0 0000000003300000101004248811 190.0 0001000033000001010002488210 150.0
The training set is best stored in a single file. The training function takes the following arguments
(NN_Train PAIRS_LIST/INFILE OUTFILE ITERATIONS)
The second argument may be a Lisp list of input/output pairs or a
file name that contains an (unbracketed) list as above. (The
non-file option was really added for very simple tests and debugging
of the system.) The OUTFILE
option names a file where a Lisp
representation of the neural net will be saved at the end of training
(and at check points). That representation is suitable as input to
the other neural network functions. ITERATIONS
is the number
of iterations this training session should perform.
Other parameters for the training may be specified in the variable
nn_params
. If set, this should consist of a list of pairs,
each pair consisting of a parameter name and a value. The
supported parameters are
n_hidden
check_pt
check_actions
save
error
list
check_pt_func
NN_Test
on the training and test sets.
start_net
OUTFILE
at
the end of a training session or at a check point. This net is used
as the starting point so training sessions can continue after being
stopped. The net representation includes details of the number of
iterations it has already executed. The ITERATIONS
argument to NN_Train
is the number of iterations required this
time rather than with respect to the number of iterations already
made with the starting net.
A typical training session may be started as in the following example
(define ccc () (NN_Load (load "rad_sdur9")) (print (NN_Test "rad_sdur5.netdata")) (print (NN_Test "rad_sdur5.test"))) (set nn_params '((n_hidden 20) (check_pt 50) (check_pt_action save error) (check_pt_func ccc))) (NN_Train "rad_sdur5.netdata" "rad_sdur9" '40000)
When continuing an interrupted training session, a typical restart command set is
(define ccc () (NN_Load (load "rad_sdur9")) (print (NN_Test "rad_sdur5.netdata")) (print (NN_Test "rad_sdur5.test"))) (set nn_params '((n_hidden 20) (check_pt 50) (check_pt_action save error) (check_pt_func ccc))) (set nn_params (cons (list 'start_net (load "rad_sdur8")) nn_params)) (NN_Train "rad_sdur5.netdata" "rad_sdur9" '40000)
The function NN_Load
takes one argument, a Lisp representation
of a net (as saved by a training session). It stores it as the
current net--though this notion of current net is only used in
testing and training and not when these nets are actually used in the
duration model.
The function NN_Test
takes as an argument a list of
input/output pairs or a file name containing such a list. (i.e. the
same form as the first argument to NN_Train
.) It tests that
list with respect to the current net, and returns a list of three
values; the mean error, the RMS error, and the standard deviation of
the error.
A number of sub-systems within CHATR use decision trees (sometimes called discrimination trees). They are all of the following format
dtree: condition-node | leaf-node condition-node: (condition true-node false-node) true-node: dtree false-node: dtree condition: (featname binary-operator value) | (featname in value value value ...) binary-operator: < > = <= >= value: number real or string leaf-node: list whose car is atomic
featname
should be a feature function as defined in
`src/feats.c'. The actual leaf node may be anything, a
probability distribution, a particular class, a linear regression
model, or whatever is appropriate. They are used with respect to a
particular stream cell. The feature name in a condition is called
for that stream cell and the result tested against the condition.
If the condition is true, the function recurses and applies the
true node to the stream. If the condition fails, it recurses and
applied the false node to the stream cell. When a leaf node is
found it is returned. As an example, a tree for predicting
reduction on syllables is
(set reduce_tree ' ((foot in 8 9 5 7 6 ) ( 0.9361 0.007192 0.05675 unreduced) ((accented = 0) ((pbi < 3.5) ( 0.2895 0.6295 0.081 reduced) ((bi < 0.5) ( 0.3242 0.4341 0.2418 reduced) ( 0.647 0.3406 0.01244 unreduced))) ((pbi < 0.5) ( 0.1538 0.7692 0.07692 reduced) ( 0.8138 0.1517 0.03448 unreduced)))))
A decision tree may be easily used within C code. Note the decision is always made with respect to a Stream cell, in this case a syllable. The feature functions named should be appropriate for the stream cell type given. Refer to the following example
#include "disctree.h" tree = list_str_eval("reduce_tree","No reduce tree"); for (s=utt_stream("Syl",utt); s != SNIL; s=SC_next(s)) { class = dt_decide(s,tree); /* returns leaf node */ if (list_sequal("reduce",list_last(class))) reduce_syl(s); }
Decision trees may be created externally from any CART-like system and converted to the above Lisp format and then used easily within CHATR
Go to the first, previous, next, last section, table of contents.