Go to the first, previous, next, last section, table of contents.


A Journey Through CHATR

CHATR is a vast program containing a large number of parallel modules, any of which may or may not be selected by the user at any one time. For the newcomer to CHATR, it is difficult to establish where the start or beginning of the system is. With that in mind the following sections describe a possible route taken through the software, module by module, by several different inputs.

The following diagram shows each module referred to by software name

      ----------------    -----------    ----------
     |PhonoWord Tagged|  |Plain  Text|  |HLP Tagged|
     |   Utterance    |  |   Input   |  |Utterance |
      ----------------    -----------    ---------- 
             |                 |             |
             |                 |             |
             |                 |             |
      PhonoWord_input      text_input    hlp_input
             |                 |             |
             |                  -----   -----
             |                       | |
             |                    hlp_module
             |                        |
              ------------   ---------
                          | |
                      word_module
                           |
                           |
                    phonology_module
                           |
                           |
                     intone_module
                           |
                           |
                    duration_module
                           |
                           |
                   int_target_module
                           |
                          \|/
                       synthesis

By default CHATR will eventually run any utterance through the same five modules. Prior to this, the input will be applied to either a function or a function followed by a module, depending on the input type. See section Design Philosophy, for a definition of the difference between function and module.

The PhonoWord_input Function

The PhonoWord_input function is in file

     ~chatr/src/input/pw_input.c

This function creates two streams

The PhonoWord_input function calls the build_phrase_tree function. This in turn calls the make_new_word and build_sub_phrases functions which cycle as often as required. Each time the build_phrase_tree function encounters a word in the given utterance, the make_new_word function is called . This adds the new word (plus the intones and features if present) to the WordStream. Thus for the PhonoWord input

     (Utterance
      PhonoWord
      (:D ()
          (:S ()
              (:C ()
                  (you (H*))
                  (can)
                  (pay))
              (:C ()
                  (for)
                  (the)
                  (hotel (H*)))
              (:C ()
                  (with)
                  (a)
                  (credit)
                  (card (H*) (L-L%)))))),

execution of the PhonoWord_input function creates the WordStream

(<you> -------- intones ------ ((H*))
 <can>
 <pay>
 <for>
 <the>
 <hotel> ------ intones ------ ((H*))
 <with>
 <a>
 <credit>
 <card>) ------ intones ------ ((H*)(L-L%))

The information contained in the intones field of each word cell is used later to build the IntoneStream.

The PphraseStream creation description will be written later. (Like when I find out how it's done.)

The text_input Function

The text_input function is in file

     ~chatr/src/input/hlp_input.c

This function creates two streams

It also converts the text input into an HLP input.

The text_input function calls two further functions: hlp_build_sphrase (cycling as often as required), which in turn calls the function hlp_make_word; and text_to_hlp, which in turn calls the functions text_read_sentence and text_build_phrased. These functions are in file

     ~chatr/src/text/text.c

The text_to_hlp function converts the text input into an HLP input. First the input text is read, sentence by sentence, by the text_read_sentence function. The text_build_phrased function then builds LEX word-cells for each sentence, and adds an IFT type to the beginning. Thus for the text input

     (Utterance Text
      "You can pay for the hotel with a credit card."),

execution of the text_to_hlp function creates the HLP input

     (Utterance HLP
       (((CAT D))
        (((CAT S) (IFT Statement))
         (((LEX You)))
         (((LEX can)))
         (((LEX pay)))
         (((LEX for)))
         (((LEX the)))
         (((LEX hotel)))
         (((LEX with)))
         (((LEX a)))
         (((LEX credit)))
         (((LEX card))))))

The WordStream is created by the function hlp_make_word. Since there is no other information provided by plain text input besides words, the intones fields of each word are set to nil. Similarly, the features fields are filled with only a LEX word-cell. Thus for the same text input as above, execution of the hlp_make_word function creates the WordStream

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((LEX you))

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((LEX can))

            ---> intones ----> NIL
<pay> ----<
            ---> features ---> ((LEX pay))

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((LEX for))

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((LEX the))

            ---> intones ----> NIL
<hotel> --<
            ---> features ---> ((LEX hotel))

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((LEX with))

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((LEX a))

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((LEX credit))

            ---> intones ----> NIL
<card> ---<
            ---> features ---> ((LEX card))

Finally, the hlp_build_sphrase function builds the SphraseStream.

The hlp_input Function

The hlp_input function is in file

     ~chatr/src/input/hlp_input.c

This function creates two streams

The hlp_input function calls the hlp_build_sphrase_ function which cycles as often as required. This in turn calls the hlp_make_word function. Since HLP input does not include intonation information, the intones field of the WordStream is set to nil. The features field is filled with all the syntactic and prosodic information of the relevant word. Thus for the HLP input

     (Utterance HLP
      (((CAT S) (IFT Statement))
       (((CAT NP) (LEX you) (Focus ++)))
       (((CAT VP))
        (((CAT Aux) (LEX can)))
        (((CAT Verb) (LEX pay)))
        (((CAT PP))
         (((CAT Prep) (LEX for)))
         (((CAT NP))
          (((CAT Det) (LEX the)))
          (((CAT Noun) (LEX hotel) (Focus ++)))))
        (((CAT PP))
         (((CAT Prep) (LEX with)))
         (((CAT NP))
          (((CAT Det) (LEX a)))
          (((CAT Adj) (LEX credit) (Focus ++)))
          (((CAT Noun) (LEX card)))))))),

execution of the hlp_input function creates the WordStream

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((CAT NP)(LEX you)(Focus ++))

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((CAT Aux)(LEX can))

            ---> intones ----> NIL
<pay> ----<
            ---> features ---> ((CAT Verb)(LEX pay))

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((CAT Prep)(LEX for))

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((CAT Det)(LEX the))

            ---> intones ----> NIL
<hotel> --<
            ---> features ---> ((CAT Noun)(LEX hotel)(Focus ++))

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((CAT Prep)(LEX with))

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((CAT Det)(LEX a))

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((CAT Adj)(LEX credit))

            ---> intones ----> NIL
<card> ---<
            ---> features ---> ((CAT Noun)(LEX card)(Focus ++))

Finally, the hlp_build_sphrase function builds the SphraseStream.

The hlp Module

The `hlp' module is in file

     ~chatr/src/input/hlp.c

hlp_module calls five functions and one module in the following order

     hlp_apply_default_rules
     hlp_phr_module
     hlp_predict_pros_events
     hlp_rephrase
     add_boundaries
     hlp_realise_accents

Each of the above will now be described in detail.

The hlp_apply_default_rules Function

The hlp_apply_default_rules function calls the hlp_traverse_add_defaults function which further calls the hlp_apply_rule function. Both called functions cycle as often as necessary.

The function hlp_apply_default_rules sequences through the HLP tree input (an HLP input can be seen as a tree) and tries to apply the user defined rules. An example of such rules (contained in the HLP_Rules variable) is

     ( ( ((Focus +)) => ((NAccent +)) )
       ( ((Focus ++)) => ((NAccent ++)) )
       ( ((Contrastive +)) => ((NAccent ++)) )
       ( ((Focus -)) => ((NAccent -)) )
       ( ((CAT S)) => ((PhraseLevel :S)) ) )

Every element contained in each features field is looked at. If an element matches an expression on the left side of the HLP_Rules list, the expression on the right is added to the features field by the hlp_apply_rule function.

For Text input, since there is no information in the features fields to which to apply rules, execution of this function will not change the WordStream.

For HLP input, the WordStream becomes

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++))
                                 ^^^^^^^^^^

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((CAT Aux)(LEX can))

            ---> intones ----> NIL
<pay> ----<
            ---> features ---> ((CAT Verb)(LEX pay))

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((CAT Prep)(LEX for))

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((CAT Det)(LEX the))

            ---> intones ----> NIL
<hotel> --<
            ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))
                                 ^^^^^^^^^^

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((CAT Prep)(LEX with))

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((CAT Det)(LEX a))

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((CAT Adj)(LEX credit))

            ---> intones ----> NIL
<card> ---<
            ---> features ---> ((NAccent ++)(CAT Noun)(LEX card)(Focus ++))
                                 ^^^^^^^^^^

The hlp_phr Module

hlp_phr_module predicts phrasing using either the default or user-selected method. The two presently available are

It will be assumed that the default DiscTree method is selected.

The module hlp_phr_module calls the disc_tree_phrase function which in turn calls the function dt_decide.

The DiscTree phrasing prediction method takes each word and determines it's break index. The break index is a measure of how strongly a particular word is linked to the previous. Possible values are 1, 2, 3 or 4. A break index of 1 indicates the two words are closely linked--such as the `the' and `hotel' in the example currently being used. A break index of 4 means that the words are very disassociated. These are usually (but not solely) the ending and beginning words of successive sentences.

The dt_decide function returns the break index for each word. It looks at the type of preceding and succeeding words and uses a decision tree to determine a value. Currently only values 1 or 4 are utilized. Thus 4 does not only indicate the end of a sentence, but also marks pauses within sentences.

When a break index 4 is returned, the disc_tree_phrase function adds `PhraseLevel :C' to the features field of the relevant word.

For Text input, execution of this module changes the WordStream to

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((LEX you))

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((LEX can))

            ---> intones ----> NIL
<pay> ----<
            ---> features ---> ((LEX pay))

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((PhraseLevel :C)(LEX for))
                                 ^^^^^^^^^^^^^^ 

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((LEX the))

            ---> intones ----> NIL
<hotel> --<
            ---> features ---> ((LEX hotel))

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((PhraseLevel :C)(LEX with))
                                 ^^^^^^^^^^^^^^

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((LEX a))

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((LEX credit))

            ---> intones ----> NIL
<card> ---<
            ---> features ---> ((LEX card))

For HLP input, the WordStream becomes

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++))

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((CAT Aux)(LEX can))

            ---> intones ----> NIL
<pay> ----<
            ---> features ---> ((CAT Verb)(LEX pay))

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((PhraseLevel :C)(CAT Prep)(LEX for))
                                 ^^^^^^^^^^^^^^

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((CAT Det)(LEX the))

            ---> intones ----> NIL
<hotel> --<
            ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((PhraseLevel :C) (CAT Prep) (LEX with))
                                 ^^^^^^^^^^^^^^

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((CAT Det)(LEX a))

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((CAT Adj)(LEX credit))

            ---> intones ----> NIL
<card> ---<
            ---> features ---> ((NAccent ++)(CAT Noun)(LEX card)(Focus ++))

The hlp_predict_pros_events Function

The hlp_predict_pros_events function calls hlp_phr_module. This module decides which prosodic prediction strategy to use and applies it. The three presently available are

It will be assumed that the default Hirschberg strategy is selected.

hlp_phr_module causes hlp_predict_pros_events to call hlp_addacc_module. This module is in file

           ~chatr/src/hlp/hlp_addacc.c

The module hlp_addacc_module calls three functions; hlp_mark_aux, aa_complex_nominals and aa_assign_accents. These functions perform three actions

Each time the hlp_mark_aux function finds a verb, it is tested to determine if it may actually be an auxiliary.(4) If this proves so, a `(CAT Aux)' is added to the features field and the `(CAT Verb)' (if it exists) removed. In our present example the auxiliary verb `can' has been correctly tagged (this is tough enough already without adding problems for effect!), so this function will not need to make any changes.

The aa_complex_nominals function calls two further functions; aa_cn_simple_assign and aa_cn_assign. Their purpose is to assign the correct stress to complex nominals. A complex nominal is a noun and adjective pair which forms a single concept, such as `credit card'. For each word of a complex nominal, the aa_cn_assign function decides which one has to be stressed and which one unstressed. The former have a `(CN Stress)' added to the features field, and the latter a `(CN Unstress)'.

The aa_assign_accents function calls aa_accent_assign which calls the function aa_aaa. This in turn calls two further functions, hlp_closed_deaccented and hlp_closed_accented. Influenced by pre-existing features and those added since the start of processing, these functions decide the type of accents required (`(HAccent +)', `(HAccent -)', `(HAccent ++)' or `(HAccent c)') and add them to the features fields. Should a `HAccent' or `NAccent' already exist in a features field, none is added. The IntoneStream will be built from these features later.

For Text input, execution of this module changes the WordStream to

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((HAccent +)(LEX you))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((HAccent -)(LEX can))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<pay> ----<
            ---> features ---> ((HAccent +)(LEX pay))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((HAccent -)(PhraseLevel :C)(LEX for))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((HAccent -)(LEX the))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<hotel> --<
            ---> features ---> ((HAccent +)(LEX hotel))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((HAccent -)(PhraseLevel :C)(LEX with))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((HAccent -)(LEX a))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((HAccent +)(LEX credit))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<card> ---<
            ---> features ---> ((HAccent +)(LEX card))
                                 ^^^^^^^^^

For HLP input, the WordStream becomes

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++))

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((HAccent -)(CAT Aux)(LEX can))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<pay> ----<
            ---> features ---> ((HAccent +)(CAT Verb)(LEX pay))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
                                 ^^^^^^^^^                 (LEX for))

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((HAccent -)(CAT Det)(LEX the))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<hotel> --<
            ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
                                 ^^^^^^^^^                 (LEX with))

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((HAccent -)(CAT Det)(LEX a))
                                 ^^^^^^^^^

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((HAccent -)(CN Unstress)(CAT Adj)(LEX credit))
                                 ^^^^^^^^^  ^^^^^^^^^^^

            ---> intones ----> NIL
<card> ---<
            ---> features ---> ((CN stress)(NAccent ++)(CAT Noun)(LEX card)
                                 ^^^^^^^^^                       (Focus ++))

Comparing WordStreams, it can be seen that the one generated from HLP input contains far more accurate features than that from Text. This is a direct result of the superior information offered by HLP input.

The hlp_rephrase Function

The hlp_rephrase function calls the hlp_phrase_flatten function which in turn calls the hlp_remove_empty_phrase function which then calls the hlp_rebuild_phrase function. The last three functions cycle as often as required.

The hlp_rephrase function operates on the SphraseStream. (`S' stands for `Syntax'.) Three tasks are performed. Referring to the SphraseStream from the HLP input of the current example

     (((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
      (((NAccent ++) (CAT NP) (LEX you) (Focus ++)))
      (((CAT VP))
       (((HAccent -) (CAT Aux) (LEX can)))
       (((HAccent +) (CAT V) (LEX pay)))
       (((CAT PP))
        (((HAccent -) (PhraseLevel :C) (CAT Prep) (LEX for)))
        (((CAT NP))
         (((HAccent -) (CAT Det) (LEX the)))
         (((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++)))))
       (((CAT PP))
        (((HAccent -) (PhraseLevel :C) (CAT Prep) (LEX with)))
        (((CAT NP))
         (((HAccent -) (CAT Det) (LEX a)))
         (((HAccent -) (CN Unstress) (CAT Adj) (LEX credit)))
         (((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))))))),

The hlp_phrase_flatten function deletes the HLP nodes (viz. `(CAT NP)', `(CAT VP)' or `(CAT PP)') since they have served their purpose and are no longer useful. If the HLP input is viewed as a tree in which the leaves are words, this function puts the leaves all at the same level. The `tree' becomes

     (((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
      ((NAccent ++) (CAT NP) (LEX you) (Focus ++))
      ((HAccent -) (CAT Aux) (LEX can))
      ((HAccent +) (CAT V) (LEX pay))
      ((PhraseLevel :C))
      ((HAccent -) (CAT Prep) (LEX for))
      ((HAccent -) (CAT Det) (LEX the))
      ((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++))
      ((PhraseLevel :C))
      ((HAccent -) (CAT Prep) (LEX with))
      ((HAccent -) (CAT Det) (LEX a))
      ((HAccent -) (CN Unstress) (CAT Adj) (LEX credit))
      ((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))),

The hlp_remove_empty_phrase function cleans the SphraseStream by locating empty phrases and removing them. In the current example there are none present, so nothing will change.

The hlp_rebuild_phrase function rebuilds the SphraseStream into a tree form by extracting the `PhraseLevel' features and making nodes of them. For HLP input the SphraseStream becomes

     ((((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
       (((NAccent ++) (CAT NP) (LEX you) (Focus ++)))
       (((HAccent -) (CAT Aux) (LEX can)))
       (((HAccent +) (CAT V) (LEX pay)))
       (((PhraseLevel :C))
        (((HAccent -) (CAT Prep) (LEX for)))
        (((HAccent -) (CAT Det) (LEX the)))
        (((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++))))
       (((PhraseLevel :C))
        (((HAccent -) (CAT Prep) (LEX with)))
        (((HAccent -) (CAT Det) (LEX a)))
        (((HAccent -) (CN Unstress) (CAT Adj) (LEX credit)))
        (((CN Stress) (NAccent ++)  (CAT Noun) (LEX card) (Focus ++))))))

For Text input (already having a flat HLP tree), the SphraseStream changes to

     ((((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
       (((HAccent +) (LEX you)))
       (((HAccent -) (LEX can)))
       (((HAccent +) (LEX pay)))
       (((PhraseLevel :C))
        (((HAccent -) (LEX for)))
        (((HAccent -) (LEX the)))
        (((HAccent +) (LEX hotel))))
       (((PhraseLevel :C))
        (((HAccent -) (LEX with)))
        (((HAccent -) (LEX a)))
        (((HAccent +) (LEX credit)))
        (((HAccent +) (LEX card))))))

The add_boundaries Function

The add_boundaries function is in file

     ~chatr/src/lex

This function calls two further functions, find_left_boundary and find_right_boundary.

The purpose of these functions is to locate and mark the left and right boundaries between each word. Remember that speech will eventually be formed by concatenation of phonemes to form words and the spaces (silence) between them. So not just the position of break is noted; a value is assigned which indicates the unit space to be allocated later between those words. The figures are based on the break indexes already determined by hlp_phr_module. These values are adjusted, however; a break index of 1 becomes a boundary value of 0, and a break index of 4 becomes a boundary of 2. In case of conflict the highest value is chosen. The left boundary of the first word and the right boundary of the last are set to 4.

The boundary values for the WordStream of the present example are

     4  you  0

     0  can  0 

     0  pay  2

     2  for  0

     0  the  0

     0  hotel  2

     2  with  0

     0  a  0

     0  credit  0

     0  card  4

Boundary values are kept in the left_boundary and right_boundary fields of each word.

The hlp_realise_accents Function

The hlp_realise_accents function calls the hlp_apply_patterns function which in turn calls the hlp_apply_pattern function. This function cycles as often as necessary and calls the function hlp_apply_actions which cycles too. Finally the hlp_apply_actions function calls hlp_apply_simple_actions.

The hlp_realise_accents function applies the pattern rules stored in the HLP_Patterns variable. These rules take the form

     (Statement  (START ) 
                 (HAccent (+ (H*)) 
                          (++ (L+H*)))
                 (PHRASE (H-))
                 (TAIL (L-L%)))
     (YNQuestion (START ) 
                 (HAccent (+ (L*)))
                 (TAIL (H-H%)))
     (Question   (START ) 
                 (HAccent (+ (L*)))
                 (TAIL (L-L%)))
     (*          (START)
	         (HAccent (+ (H*)))
                 (PHRASE (H-))
                 (TAIL (H-L%)))

Some actions, like START, PHRASE or TAIL, are considered special because they concern phrases. These are applied by the hlp_apply_actions function. Others, like HAccent, are said to be simple because they concern words. They are applied by the function hlp_apply_simple_actions.

The current example is a `Statement' utterance type, so the part of the pattern rules which are going to be used is

     (Statement  (START ) 
                 (HAccent (+ (H*)) 
                          (++ (L+H*)))
                 (PHRASE (H-))
                 (TAIL (L-L%)))
     )))

The hlp_realise_accents function is the first to affect the `intones' field of the WordStream. If a word has an `(HAccent +)' feature, a `(H*)' intone will be added to it's intones field. If it is the last word of a phrase, a `(H-)' intone will also be added.

For Text input, execution of this module changes the WordStream to

            ---> intones ----> ((H*))
<you> ----<                      ^^
            ---> features ---> ((HAccent +) (LEX you))

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((HAccent -) (LEX can))

            ---> intones ----> ((H*) (H-))
<pay> ----<                      ^^   ^^
            ---> features ---> ((HAccent +) (LEX pay))

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((HAccent -) (PhraseLevel :C) (LEX for))

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((HAccent -) (LEX the))

            ---> intones ----> ((H*) (H-))
<hotel> --<                      ^^   ^^
            ---> features ---> ((HAccent +) (LEX hotel))

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((HAccent -) (PhraseLevel :C) (LEX with))

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((HAccent -) (LEX a))

            ---> intones ----> ((H*))
<credit> -<                      ^^
            ---> features ---> ((HAccent +) (LEX credit))

            ---> intones ----> ((H*) (L-L%))
<card> ---<                      ^^   ^^^^
            ---> features ---> ((HAccent +) (LEX card))

For HLP input, the WordStream becomes

            ---> intones ----> NIL
<you> ----<
            ---> features ---> ((NAccent ++) (CAT NP) (LEX you) (Focus ++))

            ---> intones ----> NIL
<can> ----<
            ---> features ---> ((HAccent -) (CAT Aux) (LEX can))

            ---> intones ----> ((H*) (H-))
<pay> ----<                      ^^   ^^
            ---> features ---> ((HAccent +) (CAT Verb) (LEX pay))

            ---> intones ----> NIL
<for> ----<
            ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
                                                           (LEX for))

            ---> intones ----> NIL
<the> ----<
            ---> features ---> ((HAccent -) (CAT Det) (LEX the))

            ---> intones ----> ((H-))
<hotel> --<                      ^^
            ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))

            ---> intones ----> NIL
<with> ---<
            ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
                                                           (LEX with))

            ---> intones ----> NIL
<a> ------<
            ---> features ---> ((HAccent -) (CAT Det) (LEX a))

            ---> intones ----> NIL
<credit> -<
            ---> features ---> ((HAccent -)(CN Unstress)(CAT Adj)(LEX credit))

            ---> intones ----> ((L-L%))
<card> ---<                      ^^^^
            ---> features ---> ((CN stress)(NAccent ++)(CAT Noun)(LEX
card)
                                                                (Focus ++))

The word Module

The `word' module is in file

     ~chatr/src/lex/word.c

word_module calls two functions, add_boundaries and add_intonation, and two modules, lexicon_module and reduce_module. The add_intonation function is in file

     ~chatr/src/intonation/intonation.c

The `reduce' module is in file

     ~chatr/src/lex/reduce.c

This module creates three streams

Appropriate cells of each of these streams are linked to those of the WordStream.

For text or HLP input, the add_boundaries function has already been called once by hlp_module. See section The add_boundaries Function, for a description.

The lexicon Module

lexicon_module calls three functions, lex_lookup, add_syllables and add_phonemes. This module consults the lexicon for each word of the WordStream. The lexicon contains all the words CHATR can utter, with their decomposition into syllables and phonemes. The SylStream and PhoneStreams are built from this information. The SylStream associated with the current example is

     (<y uu> --------- lex_stress ------ (0)

      <k @ n> -------- lex_stress ------ (0)

      <p ei> --------- lex_stress ------ (1)

      <f @> ---------- lex_stress ------ (0)

      <dh @> --------- lex_stress ------ (0)

      <h ou> --------- lex_stress ------ (0)

      <t e l> -------- lex_stress ------ (1)

      <w i th> ------- lex_stress ------ (0)

      < @ > ---------- lex_stress ------ (0)

      <k r e> -------- lex_stress ------ (1)

      <d i t> -------- lex_stress ------ (0)

      <k aa d>) ------ lex_stress ------ (1)

     [(0) repesents `unstressed', a (1) `stressed']

The add_intonation Function

The add_intonation function calls the function make_intonation_cell. These functions build the IntoneStream from information in the intones field of the WordStream. Depending on the input type, the cells of the intone fields have been filled in different ways. This results in three quite different IntoneStreams. For text input, the IntoneStream will be

     (<you>    <=====================> (<H*>
      <can>
                 ====================>  <H*> 
      <pay>    <
                 ====================>  <H->
      <for>
      <the>
                 ====================>  <H*>
      <hotel>  <
                 ====================>  <H->
      <with>
      <a>
      <credit> <=====================>  <H*>
                 ====================>  <H*>
      <card>)  <
                 ====================>  <L-L%>)

and for PhonoWord input

     (<you>    <=====================> (<H*>
      <can>
      <pay>
      <for>
      <the>
      <hotel>  <=====================>  <H*>
      <with>
      <a>
      <credit>
                 ====================>  <H*>
      <card>)  <
                 ====================>  <L-L%>)

finally, for HLP input

     (<you>                            (
      <can>
                 ====================>  <H*> 
      <pay>    <
                 ====================>  <H->
      <for>
      <the>
      <hotel>  <=====================>  <H->
      <with>
      <a>
      <credit>
      <card>)  <=====================>  <L-L%>)

The HLP input IntoneStream may appear rather sparse, especially when considering the length of code that produced it compared to the size of other modules. This is in fact due to the HLP_Patterns variable. In the one used for this example, there were no mappings for (NAccent ++) or (HAccent -) features. Had those been added, making the `statement' part of the variable

     (Statement  (START ) 
                 (HAccent (+ (H*)) 
                          (++ (L+H*))
                          (- (L*)))
                 (NAccent (++ (H+!H*)))
                 (PHRASE (H-))
                 (TAIL (L-L%)))

the text input IntoneStream would have been

     (<you>    <=====================> (<H*>
      <can>    <=====================>  <L*>
                 ====================>  <H*> 
      <pay>    <
                 ====================>  <H->
      <for>    <=====================>  <L*>
      <the>    <=====================>  <L*>
                 ====================>  <H*>
      <hotel>  <
                 ====================>  <H->
      <with>   <=====================>  <L*>
      <a>      <=====================>  <L*>
      <credit> <=====================>  <H*>
                 ====================>  <H*>
      <card>)  <
                 ====================>  <L-L%>)

and for HLP input

     (<you>    <=====================> (<H+!H*>
      <can>    <=====================>  <L*>
                 ====================>  <H*> 
      <pay>    <
                 ====================>  <H->
      <for>    <=====================>  <L*>
      <the>    <=====================>  <L*>
      <hotel>  <=====================>  <H->
      <with>   <=====================>  <L*>
      <a>      <=====================>  <L*>
      <credit> <=====================>  <L*>
      <card>)  <=====================>  <L-L%>)

So for little improvement in high intonation, the modified variable has introduced a lot of low intonation `clutter'. With regards to HLP input, although `hotel' and `card' are tagged (Focus ++) like `you', there is no (H+!H*) Intone cell aligned with these words. This is because CHATR only accepts one (Focus ++) marked word for any one sentence; the following occurrences are ignored. This part of the code could of course easily be changed in the hlp_realise_accent function. For text input, the IntoneStream is further affected; too many (H*) intones have been added (usually on every noun, pronoun and verb). Important words are therefore hidden among not so important ones.

With HLP Input, it is not arduous for the user to mark the important words (if actually found necessary) by adding a (Focus ++) label. If (Focus ++) matches with a (H+!H*) accent, these words will sound differently.

For PhonoWord Input, the user is expected to supply all the accents. CHATR will not add new ones. While it may be quick to add accents like (H*) or (H+!H*) to important words (usually not of a great number), it rapidly becomes tedious to add accents like (L*) to unimportant words (usually numerous).

As far as the IntoneStream is concerned, HLP input has to be the best method. It automatically finds the accents for each word, and gives the user the capability to make changes--indicate words that need to be focused on, for instance. The only drawback being that an HLP input is quite long to write.

The reduce Module

reduce_module calls two functions, contract_word and reduce_syls.

This module detects and performs contractions. For the grammatically challenged, this means it turns `would have' into `would've' or `he is' into `he's'. A word must satisfy several criteria to be contracted: it must be in a list of contractables (contained in the 'contract_words' variable), such as have, has, are, am, would, etc.; both left and right boundaries must be zero; it must have no intone. If these criteria are met, the word will be removed and the phoneme it cross-references to in the `contract_words' variable added to the previous word. Contents of and links between streams are of course modified too.

The phonology Module

The `phonology' module is in file

     ~chatr/src/phoneme/phonology.c

The module phonology_module calls three functions; fill_phoneme, phone_to_segment and phrase_pause. Depending on the pause prediction method selected, the last function calls either pp_disctree or insert_phrase_pause. If called, insert_phrase_pause further calls insert_pause.

The insert_phrase_pause and insert_pause functions are in file

     ~chatr/src/intonation/phrase_int.c

This module affects two streams

The fill_phoneme function fills the features of the PhoneStream that was previously created by the `word' module. See section The word Module, for creation information.

The phone_to_segment function builds the SegStream.

The phrase_pause function inserts silence segments where needed, according to the chosen pause prediction method. If pp_disctree is selected, a silence segment is inserted after every comma, colon, question mark or full stop (period), if the following phrase contains at least one stressed syllable. If insert_phrase_pause is selected, a silence segment is inserted at every phrase break (phrase_level :C).

For new or basic users, the insert_phrase_pause method is recommended as a start, since it utilizes work done by previous modules.

The intone Module

The `intone' module is in file

     ~chatr/src/intonation/intonation.c

intone_module calls the function tobi_intonation. This function is in file

     ~chatr/src/intonation/ToBI.c

This module fills the IntoneStream. The method currently used is the ToBI intonation method (H*, L*, H-,...). For each syllable of the SylStream, tobi_intonation predicts pitch accents (H*, L*,...), phrase accents (H-, L-,...) and boundaries tones (L%,...).

However, for text or HLP input it is recommended that this module is either bypassed or at least run with pitch accent prediction switched off. The reason for this is that the HLP Module (see section The hlp Module) has already supplied sufficient intonation information; more is not necessary and possibly counterproductive.

For PhonoWord input this module is useful. It supplies further information about phrase accents and boundaries. Using the current example, the WordStream becomes

     (<you> -------- intones ------ ((H*))
      <can>
      <pay> -------- intones ------ ((H-))
      <for>                          ^^^^
      <the>
      <hotel> ------ intones ------ ((H-))((H*))
      <with>                         ^^^^
      <a>
      <credit>
      <card>) ------ intones ------ ((H*)(L-L%))

and the IntoneStream

     (<you>    <=====================> (<H*>
      <can>
      <pay>    <=====================>  <H->
      <for>
      <the>
                 ====================>  <H*>
      <hotel>  <
                 ====================>  <H->
      <with>                            ^^^^
      <a>
      <credit>
                 ====================>  <H*>
      <card>)  <
                 ====================>  <L-L%>)

Note that other intone cells already present, such as the `<L-L%>' on `card' are further added by this module.

The duration Module

The `duration' module is in file

     ~chatr/src/duration/duration.c

duration_module calls three functions, lr_dur, dur_mark_all_segs and dur_mark_all_syls. The function lr_dur calls two further functions, lrd_segs_durations and lrd_pause_durations. The last function calls one more, pause_duration.

The `duration' module determines the finite timing for the constituents of an utterance. The lrd_segs_durations function determines the duration of the segments. The function lrd_pause_durations determines that of the pauses, in accordance with the PhraseLevel type. The dur_mark_all_segs function marks absolute starts for segments. The dur_mark_all_syls function does the same for syllables.

For the current example, the segments and their resulting absolute start timings (in mS) for each input method considered are

     Segs    PhonoWord      Text          HLP

     y            0            0            0
     uu          66           64           61
     k          118          117           89  
     @          237          235          208 
     n          276          274          247 
     p          317          315          288 
     ei         419          423          396 
     #          538          565          538 
     f          638          665          638 
     @          737          764          737 
     dh         790          817          790 
     @          837          864          837 
     h          893          920          893 
     ou         985         1012          979
     t         1111         1138         1083
     e         1205         1232         1169
     l         1333         1360         1278
     #         1421         1448         1361
     w         1521         1548         1461
     i
     th
     @         1664         1691         1604        
     k         1716         1743         1656 
     r
     e
     d         1926         1987         1866      
     i
     t
     k         2112         2198         2052
     aa
     d

     [`#' indicates a pause]

Syllables and absolute start timings (again in mS) are

     Sylls       PhonoWord     Text      HLP
     
     (y uu)            0         0         0           
     (k @ n)         118       117        89          
     (p ei)          317       315       288         
     (f @)           638       665       638         
     (dh @)          790       817       790         
     (h ou)          893       920       893         
     (t e l)        1111      1138      1083        
     (w i dh)       1521      1548      1461
     (@)            1664      1691      1604        
     (k r e)        1716      1743      1656        
     (d i t)        1926      1987      1866        
     (k aa d)       2112      2198      2052

Since prosody is different for each type of input, durations are corespondingly different.

The int_target Module

The `int_target' module is in file

     ~chatr/src/intonation/intonation.c

The module int_target_module calls the function tobi_make_targets which in turn calls tobi_make_targets_lr. This function calls three further functions, lr_predict, f0_normalize and tobi_add_target_seg.

The tobi_make_targets and tobi_make_targets_lr functions are in file

     ~chatr/src/intonation/ToBI.c

The int_target_module function determines the f0 value for each syllable of the SylStream. The method presently used is linear regression for ToBI intonation. Three f0 values (normalized) are given for each syllable by the function make_targets_lr. The first is for the first segment of the syllable, the second for the nucleus segment (the vowel) and the third for the last segment. The f0 values for the current example are

         PhonoWord        Text            HLP
* (y uu)                                          (syllable)
        196.917068      192.78118      176.45158  (f0 before normalization)
  y     142.290955      140.17704      131.83081  (1st segment + f0
        256.088196      252.19085      205.92985             normalized)        
  uu    172.533966      170.54199      146.89747  (nucleus segment)
        242.849289      239.84178      209.83558
  uu    165.767410      164.23024      148.89373  (last segment)
* (k @ n)
        204.067780      201.49116      173.36627
  k     145.945755      144.62881      130.25387
        228.281631      228.18071      215.93901
  @     158.321716      158.27014      152.01327
        213.495285      231.47927      225.05279
  n     150.764252      159.95608      156.67143
* (p ei)
        192.711884      209.04147      201.94967
  p     140.141632      148.48786      144.86317
        181.688629      227.94963      229.74664
  ei    134.507523      158.15203      159.07051
        161.305099      191.31132      196.99272
  ei    124.089272      139.42578      142.32962
* (f @)
        169.542007      199.16011      201.15170
  f     128.299255      143.43739      144.45532
        151.233627      170.03631      171.95320
  @     118.941635      128.55189      129.53163
        151.086487      161.10058      164.70037
  @     118.866425      123.98474      125.82463
* (dh @)
        151.363708      164.91391      171.62640
  dh    119.008118      125.93377      129.36460
        156.121429      165.93020      169.92852
  @     121.439842      126.45321      128.49679
        173.493652      187.45507      172.47857
  @     130.318985      137.45481      129.80015
* (h ou)
        171.729584      190.19567      176.44268
  h     129.417343      138.85557      131.82626
        210.256027      230.74173      184.58163
  ou    149.108643      159.57910      135.98617
        218.643097      226.80609      178.81588
  ou    153.395355      157.56755      133.03923
* (t e l)
        223.215790      227.31828      178.08351
  t     155.732513      157.82934      179.66101
  e     165.714127      163.17337      133.47117
        204.412720      193.96093      163.17901
  l     146.122055      140.78002      125.04705
* (w i dh)
        202.834274      196.86778      163.32327
  w     145.315292      142.26574      125.12078
        165.008423      156.45224      141.01254
  i     125.982086      121.60892      113.71752
        159.509415      153.40560      150.44239
  dh    123.171478      120.05175      118.53722
* (@)
        162.552399      155.83990      151.50260
  @     124.726784      121.29595      119.07910
        155.529129      151.53080      150.36933
  @     121.137108      119.09352      118.49988
        157.041901      172.01840      155.39329
  @     121.910309      129.56495      121.06768
* (k r e)
        163.318283      177.07127      163.75167
  k     125.118233      132.14753      125.33974
        137.365234      183.52533      139.22203
  e     111.853340      135.44627      112.80236
        137.160309      185.15049      137.73902
  e     111.748604      136.27691      112.04439
* (d i t)
        134.786163      186.64566      133.66029
  d     110.535149      137.04110      109.95970
        121.638107      184.22201      119.69841
  i     103.815033      135.80236      102.82363
        126.368195      167.65847      105.95538
  t     106.232635      127.33655       95.799423
* (k aa d)
        145.771805      190.14331      125.73971
  k     116.150032      138.82879      105.91140
        119.275711      136.78218       70.974113
  aa    102.607590      111.55533       77.920105
         52.435909       65.044205      20.000900
  d      68.445023       74.889259      51.867126

As with durations, f0 values are different for each type of input, since they have different prosodies.

Now that duration values and f0 values have been determined, f0 contours can be built.


Go to the first, previous, next, last section, table of contents.