Go to the first, previous, next, last section, table of contents.


CHATR Variables

This appendix contains a list of variables used by CHATR, with a description of values and functions. Note that in many cases these variables are unset, which means usually they have no effect. Obviously if set they will affect CHATR's operation.

cep_dist_params
Contains parameters for the cepstrum distance functions used in unit selection weight training (and possibly selection too). If set, this variable should contain an a-list of parameter names and values. The possible parameter names are
v_start (0)
Which parameter to start from within a vector.
v_end (16)
While parameter to end at from within a vector.
cep_no_db (0)
If 1, the following filename given to Compare_Cepstrums is a full pathname.
filetype (NUUTALK)
The file type of the cepstrum files (may also be HTK).
align_type (naive)
What time aligning should be doing to two cepstrum string vectors.
naive
No time alignment, match to shorter one.
tw
Interpolate selected to target.
dtw
Dynamic time warping (not yet implemented).
frame_sds
Standard deviations for each parameter in a frame. Used in weighted Euclidean distance.
frame_weights
Weight for each parameter in a frame. Used in weighted Euclidean distance.
dist_type (euclidean)
Distance metric used for frame comparisons.
euclidean
Simple Euclidean distance (squared error).
weighted_euclidean
Difference is divided by sd squared and multiplied by weight. HACK param 0 is assumed to be f0 where 0 means unvoiced.
mahalanobis
not yet implemented.
chatr_confirm_exit
If value is set and non-nil, CHATR will prompt for confirmation before exiting.
chatr_hush_startup
If value is set and non-nil, CHATR will not display the startup copyright message.
chatr_max_clients
If set to an integer, limits the number of clients to that number. If unset (or nil), no limit is specified.
chatr_secure_functions
A list of function names that may be called at top level while in server mode. If set to non-nil value, only those functions named in this list may be called by a client program. This is for security, as although CHATR should be run as user nobody in server mode, it should only be used for synthesis--not for devious things. Note that `set' cannot be included in the list, as then you could simply change it. Neither should defvar, define, system or Audio be included. Some would say even this is not secure enough.
chatr_server_portnum
If value is set and an integer it is used as the port, CHATR listens in server mode. Note this has to be set either in init.ch, the users .chatrrc file or a file loaded on the command line, as the number is used before anything is read interactively.
dumbplus_params
If a unit selection synthesizer is being used and the concat method is set to DUMB+, this variable defines various parameters for the DUMB+ module. The value should be an a-list of parameter names and values. Currently supported parameters are (defaults shown in parentheses)
strategy (mds)
Defines join point strategy.
z_crossings
Join at zero crossings (ignoring direction).
mds
Minimal distance splicing. Look at a short window of samples to find closest fit. (This is quite good.)
dumb
Just butt them, *no* modification.
breaks
Add break_size in mS (default 100) between each unit.
mds_search_window_ms (4.0)
Number of mS of unit to look for mds join point.
mds_diff_wind_samp (7)
Number of sample points to test for minimal distance.
pm_align (OFF)
If ON will first align unit edges to pitch marks before join strategy is applied.
break_size (100)
Size of pause in mS between units when breaks strategy is used.
dur_lr_model
When Duration_Method is set to LR_DUR, this variable should hold a pair, a list of features and a decision tree whose leaves are a linear regression model. The number of weights in the model should be equal to the number of characters when the values of all the features are concatenated. They of course must all be digits.
etc-dir
A directory containing CHATR specified executables. This is initialized to the <chatr_lib>/.../etc, though for certain implementations may need to be changed.
*features*
Contains a list of atoms identifying features about this particular installation (cf. Common Lisp's *features* variable). This variable can be used to check the availability of various features, e.g. NIST SPhere support, (direct) DAT-Link support, CSTR diphones etc.
feature_maps
A rather crude way that some systems can map category features to binary ones. This consists of a list of names and a list of members. If a feature map is applied to a feature value, and that value is a member of the specified set, then the feature value becomes 1, otherwise 0. This is used for F0 prediction in the lr models. An example is
     (set feature_maps
        '((tobi_accent_0 H*)
          (tobi_accent_1 !H*)
          (tobi_accent_2 L*)
          (tobi_accent_3 L+H* L+!H* H+!H* L*+!H L*+H)
          (tobi_accent_4 *? * X*?)))
f0_no_jitter
If set to non-nil, no jitter will be added to generated F0s. The method used to generate jitter is random but not quite in the right way. Most speakers set this to 't, so no jitter is generated.
HLP_Pattern
HLP_Patterns (and HLP_Rules) must be set before HLP functions will work. HLP functions are used to predict intonation parameters from discourse labeled input. HLP_Patterns add features (based on IFT feature values) on categories dominated by (CAT S) labeled nodes. HLP_Patterns consist of a list of pattern rules. Refer to the User Guide for more details.
HLP_prosodic_strategy
The value of this variable defines the strategy to be used to predict intonation event positions. This is used for discourse input, including tts. If unset the strategy defaults to Hirschberg. Possible values are
Hirschberg
Predict accent position based on a heuristic based algorithm.
Monaghan
A phrase based algorithm (not as fully implemented as Hirschberg).
DiscTree
Use a decision tree to predict accent position (0.7 old version of trees).
None
Do not do any prediction. Either features already exist in the input that can be used to realise accents in input, or you just don't want any predicted.
HLP_phr_disc_tree
A decision tree for predicting phrase boundaries when HLP_phrase_strategy is set to DiscTree. It should return values 0, 1, 2, 3 or 4, for a given word. Note that at this point in the synthesis process only some features may work. No syllable, phoneme, or intonation prediction has taken place. This process must happen before the following can succeed.
HLP_phrase_strategy
The value of this determines which phrase prediction algorithm should be used in HLP processing and TTS. Possible values are
Bachenko_Fitzpatrick
Use the Bachenko and Fitzpatrick algorithm.
DiscTree
Use a decision tree to predict boundaries. See hlp in HLP_phr_disc_tree for the tree.
None
Don't predict anything. (Though input in HLP mode may already contain explicit phrase marking.)
HLP_realise_strategy
Defines method used to realise predict prominence as accents. If value is Simple_Rules, uses HLP_Rules (and HLP_Patterns) to realise prominence and phrasing into the specified accents. If unset, still uses this method but ToBI and JTobi ignore it and re-predict accents based on their own methods. Setting this to Simple_Rules and using PhonoWord input allows you to completely by-pass the ToBI (and JToBI) accent and boundary prediction methods, thus allowing hand control of them.
HLP_Rules
HLP_rules (and HLP_Pattern) must be set before HLP functions will work. HLP functions are used to predict intonation parameters from discourse labeled input. HLP_Rules are the first stage in adding default features to existing feature categories in the Sphrase structure. The value of HLP_Rules should be a list of defaults. Defaults are of the form
      ( FLIST1 -> FLIST2 )
where FLIST1 and FLIST2 are feature lists. If a category contains all the features in FLIST1, the features in FLIST2 are added if they do not already exist in the category. Later extensions should allow variables and conditions in these patterns.
jlts_no_unvoiced
For Japanese text to speech. If set to true, U and I in romaji input are converted to their voiced versions u and i. If unset or nil, the unvoiced phones are left as is. (This implies udb synth method or whatever can deal with them.)
KDD_full_kan2rom
If set to non-nil, the romaji input for Japanese synthesis is assumed to come from the KDD conversion program. The consequence is that in the KDD case, numerals are treated as break levels rather than numbers.
load-path
A list of directories that are searched for files when using load_library (and some other standard functions). This follows the usage of load-path in Emacs. It is by default initialized to the standard library directory as defined when CHATR is installed.
lexicon_syllabify
If non-nil during lexicon compilation, the entries are assumed to be unsyllabified, and vowels terminated by 0 1 or 2 denoting stress. CHATR will automatically syllabify and extract the stressing. This is to deal with the format in which the BEEP and CMU lexicons are distributed.
mb_params
Parameters for Beckman and Pierrehumbert Japanese Intonation module. Users can set values through this variable to affect the operation. The value should be an a-list of parameter names and values. Currently supported parameter names are (defaults shown in parentheses)
phrase_top (180Hz)
refval (90Hz)
hamwin_size (240ms)
Length of smoothing window.
PhrHProm (0.8)
Default prominence for H-.
WeakLParam (0.85)
Prominence of weak L% relative to strong L%.
UPHRASELProm (1.0)
Default prominence of an utt-final L%.
DPHRASELProm (1.0)
Default prominence of an absolute utt-final L%.
IphrLProm (0.9)
default prominence of a medial Iphrase boundary L%
AccPLProm (0.8)
Default prominence of a mere acc phrase boundary L%.
KernLProm (0.7)
Relative prominence of L in H*+L accent.
declinAmount (0.01)
Declation amount over utterance.
finLowAmount (0.1)
Final lowering constant stated as ratio to reduce by (i.e. default means reduce to 90% otherwise expect by end).
PhraseDownstep (0.8)
Down step amount between AccentP.
target_method (rule)
Defined if rules or linear regression (lr) are used to predict F0 target values.
target_f0mean
If target_method is lr, this value is used to map the model pitch range onto the target speakers pitch range. This should be the mean F0 for all vowels in a significant example of speech (or the whole database if possible).
target_f0std
The standard deviation of speakers f0 pitch range taken from all vowels. This is used to mapped the lr f0 model pitch range to a particular speakers range.
An example would be
     (set mb_params 
          '((phrase_top 355)
            (refval 185)
            (hamwin_size 300)))
Parameters not given a value will be set to their default.
nn_params
Parameters for neural network training. If set, this should be an a-list of parameter names and values. Current parameters are
n_hidden N
Number of hidden units (default 10).
check_pt N
Number of iterations between check points.
check_pt_func func
Lisp function to be run (no arguments) at check point.
check_pt_actions LIST
What to do when a check point occurs. There are three possible actions, all or none may be selected
save
Save the current net in the output file.
error
Display the mean error at this point.
list
Display one cycle of input and output vectors.
start_net NNet
Lisp description of a net. This is used as a starting point. It also allows training to start with a partially trained net. Example use is
     (set nn_params '((n_hidden 5) (check_pt 1000)
                      (check_pt_action save error)
                      (i_type binary))).
nnd_nets
Neural net descriptions and types for predicting syllable and phoneme durations. This should be a list of four items
SYL_ITYPE
List of features (for syls) defining input to SYL_NET.
SYL_NET
Neural network, as saved by NN_Train, for syllable durations.
PH_ITYPE
List of features (for segs) defining input to PH_NET.
PH_NET
Neural network for phoneme durations, as saved by NN_Train.
nnd_params
Parameters for neural network duration module. If set, this should be an a-list of parameter names and values. Current parameters are
syl_stretch N
Number (float) multiplied to predicted syllable duration for globally changing durations.
phoneonly 0
If 1, the syllable net is ignored and it is assumed the phone net alone can do the work.
no_smooth
If set to non-nil, an intonation system that uses smoothing (ToBI and JToBI) will not smooth the target values.
NT_cep_gc_strategy
In original NUUTALK, this variable controls the garbage collection strategy for unit cepstrum files. This is not very useful when acoustic costs are done in a vq table. When cepstrum distance measurements are used, many cepstrum files are read. This variable specifies the size of cache for keeping cepstrum files. It many be set to a number or NONE (the default), if no caching is required. A typical value is 500.
NT_cost_type
In original NUUTALK unit selection, this variable determines the acoustic cost function used in unit selection. If set to cep_dist, it uses a Euclidean distance. If set to vq_dist it uses vector quantization matching. Vector quantization is much faster than cepstrum distance, but the current database must have vq information for this to work.
nuu_female_f0
If set to non-nil value, causes NUUTALK f0 values to be increased by a factor of 1.7. This is a quick solution to generating Japanese female intonation.
nus_params
In UDB unit synthesis in NUS mode, value of this variable (an a-list) specifies the weighting for the cost function for scoring unit selections. The possible values are
exclude_list <list of file ids>
Units from these files are excluded from the selection process.
beam_width <num>
Number of candidates to carry forward at each segment.
cand_width <num>
Number of new candidates to consider at each segment.
context_wt <num>
Weighting for segmental context.
join_wt <num>
Weighting for acoustic join in unit score function.
pros_wt <num>
Weight for overall prosody (power pitch and duration).
power_wt <num>
Weighting for power.
dur_wt <num>
Weighting for segmental duration.
pitch_wt <num>
Weighting for F0 pitch.
zdist_fact 1.0
When using zscores, the targets are multiplied by this factor to reduce the extreme cases and hopefully result in more average units. A value of 0.0 causes selection of mean pitch power and duration. Larger factors allow the actual targets to influence the selection.
All of the above have defaults if left unspecified. The overall cost function is
     (((dur + pitch + power)/3 + context)/2 + join)
Thus if a weighting is set to 0.0, that feature is ignored. If increased with respect to other weights, the feature will count for more.
nus_phones
This variable is used when compiling unit database indexes (using the command Database Units). The value should be a list of entries, one for each phoneme in the database phoneme set. The entries contain phone-name mean and standard deviations for duration, pitch, voicing and power. Each entry should consists of the following fields
     phone_name   mean_dur   dur_sd   mean_pitch   pitch_sd 
     mean_voice   voice_sd   mean_power   power_sd
The entries are optional but are necessary for any database to use the Generic selection strategy (which means they are pretty mandatory).
pause_prediction_method
Determines the method used for pause prediction. Possible values are
by_phrase_break
Pauses will be inserted after phrase breaks if the pause size for the level set by Stats Pause is non-zero.
disctree
Use the decision tree in variable pause_prediction_tree.
pause_prediction_tree
A decision tree used when pause_prediction_method is set to disctree. This tree predicts existence of a pause or not for words. An example is in lib/data/tobi.ch.
power_modify
If set to a non-nil, value will cause power modification of select units when synth method is UDB. Note this will only modify power target units that have values other that 0.0. If no power prediction module is used in synthesis, it will only be useful when values are provided by other means, e.g. natural units.
ps_params
This sets up parameters for the PS_PSOLA module used for unit concatenation. The value should be an a-list of parameter names and values. Currently supported parameters are (defaults shown in parentheses)
x_pitch (1.0)
Global pitch modification factor.
x_duration (1.0)
Global duration modification factor.
modify_power (no)|yes
Modify power through a segment (poor).
pitch_min_delta (0.0)
If pitch change is less than this ratio, make no pitch modification.
pitch_max_delta (1.0)
If pitch change is greater than this only make this amount of change (1.0 == 100%).
pitch_delta (1.0)
Amount to change pitch by between target and selection. 1.0 means full change, 0.0 mean no change, 0.5 means move 50% towards target value.
dur_min_delta (0.0)
If pitch change is less than this ratio make no pitch modification.
dur_max_delta (1.0)
If pitch change is greater than this only make this amount of change (1.0 == 100%).
dur_delta (1.0)
Amount to change pitch by between target and selection. 1.0 means full change, 0.0 mean no change, 0.5 means move 50% towards target value.
percent_win_dim (2.0)
Size of Hanning window in pitch periods.
reduce_tree
A decision tree for syllables to predict whether the vowel should be reduced.
schwas
A mapping of full vowel to reduced form. This should consist of an a-list indexed by phone set name to a-lists of full vowel to reduced form. An example is in lib/data/reduce.ch.
syn_params
This sets up parameters for the waveform synthesis process. The value should be an a-list of parameter names and values. Currently supported parameters are (defaults shown in parentheses)
phrase_by_phrase (NIL)
Any non-nil value causes synthesis of an utterance to be chunk by chunk. A chunk is defined as being a string of segments terminated by a silence (or end of utterance). If this variable is nil (or unset), the other parameters have no effect.
whole_wave (t)
Means the whole wave, made by concatenating the chunks (separated by silences), is returned. In tts mode it is useful to set this to nil and use synth_hook to say each individual chunk.
silence_method zeros
Generate the silence in a wave of zeros.
natural
Let the synthesizer method do it (i.e. in udb mode let them be selected from the db).
delete
Don't create them at all.
noise
generate silence with some noise (not implemented).
hardware_silence 0
Length in milliseconds of the time it takes (roughly) for the audio output system hardware to start playing a waveform after its been given to it. DAT-Links, for example, add a delay of 750 mS. (This parameter should affect the splitting and generating of silence but hasn't been implemented yet.)
These parameters were designed to solve two problems, latency in start-up time for the first sentence to be synthesized in TTS, and bad distribution of silences in most of our databases.
synth_hook
If value is a function or list of functions, run these function(s) on an utterance immediately after a waveform has been generated (i.e. in the C function synthesis()). These function(s) should take an utterance as its only argument. For example, if you wished to normalize the gain and add an echo to all synthesized utterances
     (define echo (utt) (Filter_Wave utt 'Delay))
     (set synth_hook (list Regain echo))
ToBI_accent_tree
When ToBI intonation method is selected, this must contain a decision tree for syllables to predict accents.
ToBI_boundary_tone_tree
When ToBI intonation method is selected, this must contain a decision tree for syllables to boundary tones (and pitch accents).
tobi_lrf0_model
When ToBI intonation method is selected and target method is set (in tobi_params), this should contain a four item list. The first three items are linear regression models for predicting F0 at the start, middle, and end points of syllables. Each model consists of a list of elements. An element consists of a feature name, a weight, and optionally a feature map name (see feature_maps). The fourth item in the list is parameters. There are only two possible parameters, model_f0mean and model_f0std. These should contain the overall mean and standard deviation of the speaker from which this model was built. This is to allow mapping to other speakers pitch ranges.
ToBI_params
When ToBI intonation method is selected, these parameters affect various aspects of the F0 generation process. If set, this should be an a-list of parameter names and values. Current parameters are
pitch_accents
A list of pitch accents valid for this version (these must be specified even they are currently only used for validation).
phrase_accents
A list of phrase accents (H- L-).
boundary_tones
A list of boundary tones (actually phrase+boundary) H-H% L-H% L-L% H-L%.
target_method
If set to lr, linear regression is used to generate the F0 contour. If set to apl, uses the multi-factor method (Andreson, Pierrehumbert and Liberman).
target_f0mean
If target_method is lr, this value is used to map the model pitch range onto the target speakers pitch range. This should be the mean F0 for all vowels in a significant example of speech (or the whole database if possible).
target_f0std
The standard deviation of speakers f0 pitch range taken from all vowels. This is used to map the lr f0 model pitch range to a particular speakers range.
The following are only used if target_method is unset, or set to apl
topval
Step (in Hz) above the reference line for use as reference for calculating targets points (i.e. H* etc).
baseval
Step (in Hz) below the reference line for use as reference for calculating targets points (i.e. L* etc).
refval
Start point (in Hz) for reference line.
h1
Factor of topval for uprise before H*.
l1
Factor of baseval for downstep before L*.
prom1
Factor (times top/baseval) for magnitude of H*/L* (above/below ref. line). Also for endpoint in H-H%/L-L% boundary tones.
prom2
Factor (time top/baseval) for magnitude of H-/L- (above/below ref. line), when immediately followed by opposite boundary tone (L%/H%).
prom3
Factor (time top/baseval) for magnitude of H-/L- (above/below ref. line), when not immediately followed by boundary tone.
HiF0_factor
Factor to increase H*'s when marked with HiF0 (default is 1.3).
decline
Declination as drop factor per millisecond, i.e 0.01 means drop to 99% every millisecond. (This only applies if decline_range has a value other than 0.0.)
decline_range
Number of Hz ref. line should drop over a phrase (i.e phrase ending by phrase accent or boundary tone).
This is used for all target_methods
hamwin_size
Size of Hamming Window used in smoothing the F0 made from the target points.
The lr method of target generation is much easier to deal with than the explicit setting of values. A typical example is
     (set mb_params 
         '((target_method lr) 
           (target_f0mean 113)
           (target_f0std 31)
           (hamwin_size 240)))
These values can also be created automatically at database build time.
udb_all_means
If set to a non-nil value, segments are set with mean pitch power and duration for that particular phoneme before unit selection.
udb_nus_phones
If set, this is included in UDB databases at compile time. It contains a list of phonemes with mean and standard deviation values for duration, pitch, power and voicing for each one. This table allows Z score references in a database. There are two formats, one old which you should not use. The new format is simply a list of entries, one for each phoneme. Each entry consists of 9 fields as follow
     phone_name   duration_mean   duration_sd
     pitch_mean   pitch_sd        voicing_mean
     voicing_sd   power_mean      power_sd
The phone_name should be a phoneme in the phoneme set of the database to be compiled. The other values should be floating point type. An example entry is (there should be entries one for each phone)
     (a 89.612 29.045 108.090 41.588 0.876 0.211 1369.200 903.460)
udb_prune_units
List of unit numbers to be pruned from a database. Used at udb index compile time. It should be an a-list of phones plus list of unit (numbers) to be removed from an index. Note the entries themselves will not be removed (as they are part of other units' contexts), but will never be selected.
utt_hook
If value is a function or list of functions, run these function(s) on an utterance as it is generated (via the Utterance function). For example, if you wish all utterances to be synthesized and said automatically without explicit calls, then
      (set utt_hook (list Synth Say))


Go to the first, previous, next, last section, table of contents.