Go to the first, previous, next, last section, table of contents.

Phoneme Sets

Before a phoneme set can be used it must be formally defined. Any set of symbols may be used. A definition should include phonological features, so the system can use that information if required.

Mapping between different phoneme sets may also be specified, making CHATR that much more flexible. Maps enable lexicons, synthesizers and other parts of a system using different phoneme sets to work together, though performance can be less than optimal.

Currently the library directory includes definitions for the following phoneme sets

     mrpa
     beep
     Radio2
     darpa
     Holmes
     Japanese (nuuph)
     Korean
     Korean (H-code)
     German
     Chinese (Canton)

Mappings between most sets are also available.

Note that even when a set is defined, it may be the case that it does not match another person's view of what that phoneme set is. A change in the actual name for silence, for instance, or using different case conventions(6) may make a phoneme set formally different for that person, even though the user views them as the same. However, definition of phoneme sets are there to aid you even though they may seem frustrating at first, and do need careful use. The good news is that the system validates phonemes and will find mistyped data--this is invaluable when building large lexicons or unit databases.

A change in how CHATR deals with phonemes has been proposed. Essentially it would expand the power of the current definitions by introducing `feature-based' descriptions. This would allow users to specify their own phoneme features and values, enabling more precise definitions. Mapping would also become `feature-based' rather than simply `atomic symbol'-based as it is at present.

Phoneme Set Definitions

A phoneme set definition has the following syntax

     (Phoneme Def [name] (phone~1 features)... (phone~n features))

The name is an atom such as `mrpa', or `beep'. Each phoneme definition consists of a name followed by eight features. The features are

vc: Vowel or consonant. + = vowel, - = consonant.
lng: Vowel length. s = short, l = long, d = diphthong, 0 = consonant.
h: Vowel height. 1 = high, 2 = mid, 3 = low, - = consonant.
fr: Vowel frontness. 1 = front, 2 = mid, 3 = back, - = consonant.
rnd: Lip rounding. + = rounded, - = not rounded.
typ: Consonant type. s = stop, f = fricative, a = affricate, n = nasal, l = liquid, 0 = vowel.
plc: Place of consonant articulation. l = labial, a = alveolar, p = palatal, b = labio-dental, d = dental, v = velar, 0 = vowel.
vox: Consonant voiced or unvoiced. + = voiced, - = unvoiced.

A list of phonemes used in a particular set may be obtained using the command

     (Phoneme List phoneme-set-name)

Note that if the required set is different than the one used by the default speaker, a speaker using that set will need to be loaded (unless already done in the current session) for CHATR to recognize it. Alternatively, to save the time overhead in loading a speaker that may not actually be required, a phoneme set may be made accessible using the command

     (require 'phoneme-set-name)

A list of sets currently recognized may be obtained using the command

     (Phoneme)

Definitions for phoneme sets may be found in directory `$CHATR_ROOT/lib/data/'.

As an example, the whole definition for `mrpa' is currently

     (Phoneme Def mrpa
      ;name  vc lng  h  fr  rnd typ plc vox
      (
       (uh   +   s   2   3   -   0   0   -)
       (e    +   s   2   1   -   0   0   -)
       (a    +   s   3   1   -   0   0   -)
       (o    +   s   3   3   -   0   0   -)
       (i    +   s   1   1   -   0   0   -)
       (u    +   s   1   3   +   0   0   -)
       (ii   +   l   1   1   -   0   0   -)
       (uu   +   l   2   3   +   0   0   -)
       (oo   +   l   3   2   -   0   0   -)
       (aa   +   l   3   1   -   0   0   -)
       (@@   +   l   2   2   -   0   0   -)
       (ai   +   d   3   1   -   0   0   -)
       (ei   +   d   2   1   -   0   0   -)
       (oi   +   d   3   3   -   0   0   -)
       (au   +   d   3   3   +   0   0   -)
       (ou   +   d   3   3   +   0   0   -)
       (e@   +   d   2   1   -   0   0   -)
       (i@   +   d   1   1   -   0   0   -)
       (u@   +   d   3   1   -   0   0   -)
       (@    +   a   -   -   -   0   0   -)
       (p    -   0   -   -   +   s   l   -)
       (t    -   0   -   -   +   s   a   -)
       (k    -   0   -   -   +   s   p   -)
       (b    -   0   -   -   +   s   l   +)
       (d    -   0   -   -   +   s   a   +)
       (g    -   0   -   -   +   s   p   +)
       (s    -   0   -   -   +   f   a   -)
       (z    -   0   -   -   +   f   a   +)
       (sh   -   0   -   -   +   f   p   -)
       (zh   -   0   -   -   +   f   p   +)
       (f    -   0   -   -   +   f   b   -)
       (v    -   0   -   -   +   f   b   +)
       (th   -   0   -   -   +   f   d   -)
       (dh   -   0   -   -   +   f   d   +)
       (ch   -   0   -   -   +   a   a   -)
       (jh   -   0   -   -   +   a   a   +)
       (h    -   0   -   -   +   a   v   -)
       (m    -   0   -   -   +   n   l   +)
       (n    -   0   -   -   +   n   d   +)
       (ng   -   0   -   -   +   n   v   +)
       (l    -   0   -   -   +   l   d   +)
       (y    -   0   -   -   +   l   a   +)
       (r    -   0   -   -   +   l   p   +)
       (w    -   0   -   -   +   l   b   +)
       (#    -   0   -   -   -   0   0   -) ))

There are no formal restrictions on how the features of each phone should be defined. Note that at least one silence phoneme must be defined for the database unit selection code to work. This is achieved by contradictorily defining a phone as not a vowel (vc type -) and also consonant type 0 (i.e. a vowel).

Phoneme Maps

Mapping allows modules which use different phoneme sets to have a reasonable chance of working together. Maps will not always be possible but typically a reasonable approximation is quite functional. The syntax of a phoneme map definition is

     (Phoneme Map [from-set name] [to-set name]
      ((From-Phoneme~1 To-Phoneme~1)... (From-Phoneme~n To-Phoneme~n))

The maps are one way and the reverse map need not conform. All phonemes in the `from' set must be included in the map. An example mapping our `mrpa' definition to our `darpa' definition is

     (Phoneme Map mrpa darpa
     (
      (uh  uh)
      (e  eh)
      (a  aa)
      (o  aa)
      (i  ih)
      (u  uh)
      (ii iy)
      (uu uw)
      (oo ao)
      (aa aa)
      (@ uh)
      (ai ay)
      (ei ey)
      (oi oy)
      (au aw)
      (ou ow)
      (e eh)
      (i ih)
      (u uh)
      (  uh)
      (p  p)
      (t  t)
      (k  k)
      (b  b)
      (d  d)
      (g  g)
      (s  s)
      (z  z)
      (sh sh)
      (zh zh)
      (f  f)
      (v  v)
      (th th)
      (dh dh)
      (ch ch)
      (jh jh)
      (h  hh)
      (m  m)
      (n  n)
      (ng ng)
      (l  l)
      (y  y)
      (r  r)
      (w  w)
      (#  sil) ))

Automatic Mapping

There are four built-in points where the phoneme set may be independently specified. Mappings will occur if maps are defined.

CHATR has a notion of an internal phoneme set. This is viewed as the basic set for internal operation for which mapping to and from may be required. The internal set defaults to the first phoneme set defined after CHATR starts. Thus it is usually set by the default speaker. The internal set may be explicitly set using the command

     (Phoneme Internal_Set set-name)

Also an input phoneme set may be explicitly set. If the input set differs from the internal set, all phonemes read in utterances are mapped to the internal set at synthesis time. This allows, for instance, `darpa' segments to be loaded into to a `mrpa' based synthesis environment. The command is

     (Phoneme Input_Set set-name)

A lexicon may have its own phoneme set. This allows lexicons utilizing different phoneme sets to be transportable between synthesizers. Phonemes for words supplied by the lexicon are mapped to the internal phoneme set at look-up time.

A unit database has its own phoneme set. At unit selection time, if the database phoneme set differs from the internal set, a mapping will be made if such a map has been defined.

This model of phoneme sets is still too weak. Each utterance should be associated with a phoneme set (i.e. its own internal set) as it will be the case that utterances with different internal phoneme sets will be used in the same session. The system can currently support multiple utterances using different phoneme sets but not as neatly as it should.

Go to the first, previous, next, last section, table of contents.