Go to the first, previous, next, last section, table of contents.

Phoneme Sets

Any set of symbols may be defined as a phoneme set. Before a phoneme set may be used it must be properly defined. The definitions include phonological features, so the system may use that information about a phoneme set if necessary. Mapping may also be specified between phoneme sets, making CHATR more flexible. Maps enable lexicons, synthesizers etc. using different phoneme sets to work together, though performance may be less than optimal.

Currently the library directory includes definitions for the following phoneme sets (and mappings between them)

     mrpa
     beep
     Radio2
     darpa
     Holmes
     Japanese (nuuph)
     Korean
     Korean (H-code)
     German
     Chinese (Canton)

Note that even when a set is defined, it may be the case that it does not match another person's view of what that phoneme set is. A change in the actual name for silence, for instance, or using different case conventions(5) may make a phoneme set formally different for that person, even though the user views them as the same. However, definition of phoneme sets are there to aid you even though they may seem frustrating at first, and do need careful use. The good news is that the system validates phonemes and will find mis-typings in data--this is invaluable when building large lexicons and unit databases.

A change in how CHATR deals with phonemes has been proposed. Essentially it would expand the power of the current definitions. Feature based descriptions would allow users to specify their own phoneme features and values. Also, mapping would be `feature-based' rather than simply `atomic symbol-based' as it is at present.

Phoneme Set Definitions

A phoneme set definition has the following syntax

     (Phoneme Def [name] (phone~1 features)... (phone~n features))

The name is an atom such as `mrpa', or `beep'. Each phoneme definition consists of a name followed by eight features. The features are

vc: Vowel or consonant. + = vowel, - = consonant.
lng: Vowel length. s = short, l = long, d = diphthong, 0 = consonant.
h: Vowel height. 1 = high, 2 = mid, 3 = low, - = consonant.
fr: Vowel frontness. 1 = front, 2 = mid, 3 = back, - = consonant.
rnd: Lip rounding. + = rounded, - = not rounded.
typ: Consonant type. s = stop, f = fricative, a = affricate, n = nasal, l = liquid, 0 = vowel.
plc: Place of consonant articulation. l = labial, a = alveolar, p = palatal, b = labio-dental, d = dental, v = velar, 0 = vowel.
vox: Consonant voiced or unvoiced. + = voiced, - = unvoiced.

A list of phonemes used by a particular set may be obtained using the command

     (Phoneme List phoneme-set-name)

Definitions for phoneme sets may be found in directory `~/chatr/lib/data/'.

As an example, the whole definition for `mrpa' is currently

     (Phoneme Def mrpa
      ;name  vc lng  h  fr  rnd typ plc vox
      (
       (uh   +   s   2   3   -   0   0   -)
       (e    +   s   2   1   -   0   0   -)
       (a    +   s   3   1   -   0   0   -)
       (o    +   s   3   3   -   0   0   -)
       (i    +   s   1   1   -   0   0   -)
       (u    +   s   1   3   +   0   0   -)
       (ii   +   l   1   1   -   0   0   -)
       (uu   +   l   2   3   +   0   0   -)
       (oo   +   l   3   2   -   0   0   -)
       (aa   +   l   3   1   -   0   0   -)
       (@@   +   l   2   2   -   0   0   -)
       (ai   +   d   3   1   -   0   0   -)
       (ei   +   d   2   1   -   0   0   -)
       (oi   +   d   3   3   -   0   0   -)
       (au   +   d   3   3   +   0   0   -)
       (ou   +   d   3   3   +   0   0   -)
       (e@   +   d   2   1   -   0   0   -)
       (i@   +   d   1   1   -   0   0   -)
       (u@   +   d   3   1   -   0   0   -)
       (@    +   a   -   -   -   0   0   -)
       (p    -   0   -   -   +   s   l   -)
       (t    -   0   -   -   +   s   a   -)
       (k    -   0   -   -   +   s   p   -)
       (b    -   0   -   -   +   s   l   +)
       (d    -   0   -   -   +   s   a   +)
       (g    -   0   -   -   +   s   p   +)
       (s    -   0   -   -   +   f   a   -)
       (z    -   0   -   -   +   f   a   +)
       (sh   -   0   -   -   +   f   p   -)
       (zh   -   0   -   -   +   f   p   +)
       (f    -   0   -   -   +   f   b   -)
       (v    -   0   -   -   +   f   b   +)
       (th   -   0   -   -   +   f   d   -)
       (dh   -   0   -   -   +   f   d   +)
       (ch   -   0   -   -   +   a   a   -)
       (jh   -   0   -   -   +   a   a   +)
       (h    -   0   -   -   +   a   v   -)
       (m    -   0   -   -   +   n   l   +)
       (n    -   0   -   -   +   n   d   +)
       (ng   -   0   -   -   +   n   v   +)
       (l    -   0   -   -   +   l   d   +)
       (y    -   0   -   -   +   l   a   +)
       (r    -   0   -   -   +   l   p   +)
       (w    -   0   -   -   +   l   b   +)
       (#    -   0   -   -   -   0   0   -) ))

There are no formal restrictions on how the features of each phone should be defined. Any phone that is defined as not a vowel and having a consonant type (i.e. typ=0), will be treated as a silence. One such phoneme must be defined for the database unit selection code to work.

Phoneme Maps

Mapping allows modules which use different phoneme sets to have a reasonable chance of working together. Maps will not always be possible but typically a reasonable approximation is quite functional. The syntax of a phoneme map definition is

     (Phoneme Map [from-set name] [to-set name]
      ((From-Phoneme~1 To-Phoneme~1)... (From-Phoneme~n To-Phoneme~n))

The maps are one way and the reverse map need not conform. All phonemes in the `from' set must be included in the map. An example mapping our `mrpa' definition to our `darpa' definition is

     (Phoneme Map mrpa darpa
     (
      (uh  uh)
      (e  eh)
      (a  aa)
      (o  aa)
      (i  ih)
      (u  uh)
      (ii iy)
      (uu uw)
      (oo ao)
      (aa aa)
      (@ uh)
      (ai ay)
      (ei ey)
      (oi oy)
      (au aw)
      (ou ow)
      (e eh)
      (i ih)
      (u uh)
      (  uh)
      (p  p)
      (t  t)
      (k  k)
      (b  b)
      (d  d)
      (g  g)
      (s  s)
      (z  z)
      (sh sh)
      (zh zh)
      (f  f)
      (v  v)
      (th th)
      (dh dh)
      (ch ch)
      (jh jh)
      (h  hh)
      (m  m)
      (n  n)
      (ng ng)
      (l  l)
      (y  y)
      (r  r)
      (w  w)
      (#  sil) ))

Automatic Mapping

There are four built-in points where the phoneme set may be independently specified. Mappings will occur if maps are defined.

CHATR has a notion of an internal phoneme set. This is viewed as the basic set for internal operation for which mapping to and from may be required. The internal set defaults to the first phoneme set defined after CHATR starts. Thus it is usually set by the default speaker. The internal set may be explicitly set using the command

   (Phoneme Internal_Set set-name)

Also an input phoneme set may be explicitly set. If the input set differs from the internal set, all phonemes read in utterances are mapped to the internal set at synthesis time. This allows, for instance, `darpa' segments to be loaded into to a `mrpa' based synthesis environment.

A lexicon may have its own phoneme set. This allows lexicons utilizing different phoneme sets to be transportable between synthesizers. Phonemes for words supplied by the lexicon are mapped to the internal phoneme set at look-up time.

A unit database has its own phoneme set. At unit selection time, if the database phoneme set differs from the internal set, a mapping will be made if such a map has been defined.

This model of phoneme sets is still too weak. Each utterance should be associated with a phoneme set (i.e. its own internal set) as it will be the case that utterances with different internal phoneme sets will be used in the same session. The system can currently support multiple utterances using different phoneme sets but not as neatly as it should.

Go to the first, previous, next, last section, table of contents.